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(54) Directional asymmetric signal swing bus system for circuit module architecture 



(57) A memory device which utilizes a plurality of 
memory modules coupled in parallel to a master I/O 
module through a single directional asymmetrical signal 
swing (DASS) bus. This structure provides an I/O 
scheme having symmetrical swing around half the sup- 
ply voltage, high through-put high data bandwidth, short 
access time, low latency and high noise immunity. The 
memory device utilizes improved column access cir- 
cuitry including an improved address sequencing circuit 
and a data amplifier within each memory module 

The memory device includes a resynchronization 
circuit which allows the device to operate either synchro- 
nously and asynchronously using the same pins. 

Each memory module has independent address 
and command decoders to enable independent opera- 
tion. Thus, each memory module is activated by com- 
mands on the DASS bus only when a memory access 
operation is performed within the particular memory 
module. 

The memory device includes redundant memory 
modules to replace defective memory modules. 
Replacement can be carried out through commands on 
the DASS bus. 

The memory device can be configured to simultane- 
ously write a single input data stream to multiple memory 
modules or to perform high-speed interleaved read and 
write operations. 

In one embodiment, multiple memory devices are 
coupled to a common, high-speed I/O bus without requir- 
ing large bus drivers and complex bus receivers in the 
memory modules. 
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0 scription 

BACKGROUND OF THE INVENTION ~ 
5 RqlflpfthQ Invention 

The present invention relates to a data processing system having a few bus masters and many bus slaves connected 
in parallel to a common bus. In particular, this invention relates low latency, high bandwidth, low power, high-yield, large 
✓ capacity memory.devices suitable for data processing and video systems. This invention is particularly suitable lor sys- 

10 terns organized into multiple identical modules in a very-large-scale or wafer-scale integration environment. 



Description of the Prior Art ' 

When transmitting signals on traditional bus systems, problems typically arise when either of the following conditions 
is exist: (i) the rise or fall time of the transmitted signal is a significant fraction of the bus clock period or (ii) there are 
reflections on the bus of the signal which interfere with the rising or falling transitions of the signal The data transfer rate 
is limited in part by whether signal integrity is compromised as a result of the above conditions. Therefore, to increase 
data bandwidth, it is desirable to avoid the above-listed conditions. . 

High frequency data transmission through a bus requires a high rate of electrical charge (Q) transfer on and off the 
20 bus to achieve adequate rise and fall times. To avoid condition (i) above, large transistors in the bus drivers are needed 
to source and sink the large amounts of current required to switch the signal levels. Equation (1 ) sets forth the relationship 
between the required current drive capability (I) of the bus drivers, the number of devices (n) attached to the bus, the 
output capacitance (C) of the bus driver, the signal swing (V) needed to distinguish between logical 1 and 0, and the 
maximum operating frequency (f) of the bus. 

25 

l=nCVf Eq(1) 

"Thus, one way to obtain a higher operating frequency is to increase the drive capability of the bus driver. However, 
higher drive usually requires a driver with larger size, which in turn translates to increased silicon area, bus capacitance, 
30 power consumption and power supply noise. Furthermore, when the output capacitance of the bus driver becomes a 
substantial part of the bus capacitance, increasing the size of the bus driver does not result in a higher operating fre- 
quency. 

Another way to increase the operating frequency is to reduce the signal swing on the bus. Signal swing is defined 
as the difference between the maximum voltage and the minimum voltage of the signals transmitted on the bus. Many 

35 traditional bus systems, including the TTL standard, use reduced-swing signal transmission (i.e.. signal swing smaller 
than the supply voltage), to enable high speed operations. A reduced signal swing reduces the required charge transfer, 
thereby reducing power consumption, noise and required silicon area. Because reduced signal swing substantially 
reduces the current required from the bus driver, parallel termination of bus lines is facilitated. Parallel termination is an 
effective way to suppress ringing in the bus. However, the use of small swing signals requires the use of sophisticated 

40 amplifiers to receive the signals. As the signal swing decreases, the required gain of the amplifier increases, thereby 
increasing the required silicon area and operating power. It would therefore be desirable to have a bus system which 
utilizes small swing signals, but does not require the use of sophisticated amplifiers. 

Prior art small swing (less than 1.5 V peak-to-peak) I/O (input/output) schemes generally have a logic threshold 
voltage different from (i.e., one-half of the supply voltage), the logic threshold of a conventional CMOS logic circuit. 

45 The logic threshold, or trip point, of a bus signal is the voltage level which delineates a logical 1 from a logical 0. An 
example of such scheme is QTL, where a logic threshold of 0.8 volt is used. (R. Foss et al, IEEE Spectrum Oct 1992, 
p.54-57, "Fast interfaces for DRAMs"). Other small swing I/O schemes, such as center-tap terminated (CTT) Interface 
(JEDEC Standard, JESD8-4, Nov., 1993), have a fixed threshold (e.g.. 1.5 volts) which does not track with the supply 
voltage. To use a bus signal having logic threshold other than the CMOS logic threshold in a CMOS integrated circuit, 

so a translator circuit must be used to translate the I/O logic threshold to the conventional CMOS logic threshold. Thes 
translators consume circuit real estate and power, introduce additional circuit delay and increase circuit compl xrty. 

CMOS circuitry uses a logic threshold of to permit the CMOS circuitry to operate with symmetrical noise 
margins with respect to the power and ground supply voltages. This logic threshold also results in symmetrical inv rter 
output rise and fall times as the pull-up and pull-down drive capabilities are set to be approximately equal. 

55 Traditional DRAM devices (IC's) are organized into arrays having relatively small capacities. For example, most 
commerdal 1 M bit and 4M bit DRAM devices have an array size of 256K bit. This organization is dictated by the bit-line 
sense voltag and word line (RAS) acc ss time. However, all arrays inside a DRAM devic share a common address 
decocfing circuit. The arrays in D RAM devices are n t organized as memory modules connected in parallel to a common 
bus. Furthermore, each memory access requires the activation of a substantial number (e.g., on quarter to n half) 
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of the total number of arrays, even though most of the activated arrays are not accessed. As a result power is wasted 
and the soft-error rate due to supply noise is increased. 

Prior art DRAM schemes, such as Synchronous DRAM (JEDEC Standard, Configurations For Solid State Memories, 
No. 21-C, Releas 4, Nov. 1993) and Rambus DRAM (See, PCT Patent document PCT/US91/02590) hav attempted 

5 to organize the memory devices into banks. In the synchronous DRAM scheme, the JEDEC Standard all ws only one 
bit for each bank address, thereby implying that only two banks are allowed per memory device. If traditional DRAM 
constraints on the design are assumed, the banks are formed by multiple memory arrays. The Rambus DRAM scheme 
has a two bank organization in which each bank is formed by multiple memory arrays, in both schemes, due to the larg 
size of the banks, bank-level redundancy is not possible. Furthermore, power dissipation in devices built with either 

io scheme is at best equal to traditional DRAM devices. Additionally, because of the previously Refined limitations, neither 
the Synchronous DRAM scheme nor the Rambus DRAM scheme uses a modular bank architecture in which the banks 
are connected in parallel to a common internal bus. 

Many prior art memory systems use circuit-module architecture in which the memory arrays are organized into 
modules and the modules are connected together with either serial buses or dedicated lines. (See, PCT patent document 

is PCT/GB86/00401 , M. Brent, "Control System For Chained Circuit Modules" [serial buses]; and Yamashita, S. Ikehara, 
M. Nagashima, and T Tatematsu, "Evaluation of Defect-Tolerance Scheme in a 600M-bit Wafer-Scale. Memory", Pro- 
ceedings on International Conference on Wafer Scale Integration, Jan. 1991, pp. 12-18. [dedicated lines]). In neither 
case are the circuit modules connected in parallel to a common bus. 

Prior art memory devices having a high I/O data bandwidth typically use several memory arrays simultaneously to 

20 handle the high bandwidth requirement This is because the individual memory arrays in these devices have a much 
lower bandwidth capability than the I/O requirement. Examples of such prior art schemes include those described by 
K. Dosaka et al, "A 100-MHz 4-Mb Cache DRAM with Fast Copy-Back Scheme", IEEE Journal of Solid-Sta te Circuits. 
Vol. 27, No. 1 1 , Nov. 1992, pp. 1534-1539; and M. Farmwaid et al, PCT Patent document PCT/US91/02590. 

Traditional memory devices can operate either synchronously or asynchronously, but not both. Synchronous mem- 

25 ories are usually used in systems requiring a high data rate. To meet the high data rate requirement, synchronous 
memory devices are usually heavily pipelined. (See, e.g., the scheme described in "250 Mbyte/s Synchron us DRAM 
Using a 3-Stage- Pipelined Architecture", Y. Takai et al, IEEE JSSC, vol. 29, no. 4, April, 1 994, pp^ 426-431 .) The pipelined 
architecture disclosed in Y. Takai et al, causes the access latency to be fixed at 3 clock cycles at all clock frequ ncies, 
thereby making this synchronous memory device unsuitable for systems using lower clock frequencies. For example, 

30 when operating at 50 Mhz the device has an access latency of 60 ns (compared to an access latency of 24 ns wh n 
operating at 125 Mhz). 

Conventional asynchronous memory devices, due to the lack of a pipeline register, maintain a fixed access latency 
at all operating frequencies. However, the access cycle time can seldom be substantially smaller than the access latency. 
Consequently, asynchronous devices are unsuitable for high data rate applications. 
35 Thus, it would be desirable to have a memory device which provides a high through-put, low latency, high noise 
immunity, I/O scheme which has a symmetrical swing around one half of the supply voltage. 

It would also be desirable to have a memory device which can be accessed both synchronously and asynchronously 
using the same set of connection pins. 

Moreover, it would be desirable to have a memory device which provides a high data bandwidth and a short access 
40 time. 

It would also be desirable to have a memory device which is organized into small memory arrays, wherein only one 
array is activated for each normal memory access, whereby the memory device has lew power dissipation. 

Additionally, it would be desirable to have a memory device having small functionally independent modules, a defec- 
tive module can be disabled and another module is used to replace the defective module, resulting in a memory device 
45 having a high defect tolerance. 

It would also be desirable to have a memory device in which a single input data stream can be simultaneously written 
to multiple memory arrays and in which data streams from multiple memory arrays can be multiplexed to form a single 
output data stream. 

Furthermore, it would be desirable to have a memory device in which many memory modules are attached to a 
so high-speed common bus without the necessity of large bus drivers and complex bus receivers in the modules. 

SUMMARY QF THE INVENTION 

The present invention implements a compact, high speed reduced CMOS swing I/O scheme which uses V dd /2 as 
ss the logic threshold. This scheme has the following advantages : (i) The logic threshold tracks with supply voltages, thereby 
maintaining balance of pull-up and pull-down, (ii) The bus driver and receiver circuits work at a very wide range of supply 
voltages without sacrificing noise immunity, since the thresholds of the bus driver and receiver circuits track with each 
other automatically, (iii) The logic thresh Id is implicit in the logic circuit and does not require an explicit reference gen- 
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erator circuit (iv) Logic threshold translation is n t necessary since the I/O logic threshold is identical to chat of th other 
logic circuitry on-chip. 

The present inv rttion groups at least two memory arrays or banks into a memory module and connects all the 
memory modules in parallel to a common high-speed, directional asymmetrical signal swing (DASS) bus, thereby forming 

5 a memory device. The memory modules transmit signals having a reduced swing to a master module coupled to the 
DASS bus. In one embodiment, this reduced swing is equal to approximately one volt about a center voltage of V^, 
where is the threshold voltage of CMOS circuitry. The signal transmitted from the master device to the memory 
modules has a full V dd swing. 

The memory modules are equipped with independent address and command decoders so that they function as 

10 independent units, each with their own base address. This circuit-module architecture has several advantages: (i) it 
allows each memory module to be able to replace any other memory module thereby increasing the defect tolerance, of 
the memory device, (ii) It significantly reduces power consumption of the memory device when compared to traditional 
memory devices because each memory access is handled completely by one memory module only with only one of the 
arrays activated, (iii) Since each memory module is a complete functional unit, the memory module architectures allows 

is- parallel accesses and multiple memory module operations to be performed within different memory modules, thereby 
increasing the performance of the memory device, (iv) The memory module architecture allows the memory device to 
handle multiple memory accesses at the same time. 

The circuit-module architecture of the present invention further allows easy system expansion by connecting multiple 
memory devices in parallel through a common I/O bus which is an extension of the on chip bus. In addition, by incorpo- 

20 rating redundant memory modules on each memory device and allowing each memory module to have a programmable 
communication address on the I/O bus system, the resulting memory system has defect tolerance capability which is 
better than each individual memory device. 

In one embodiment of the present invention, the memory arrays include redundant rows and columns. Circuitry is 
provided within the memory modules to support the testing of these and redundant rows and columns. Circuitry is also 

25 provided to replace defective rows and columns with the redundant rows and columns during operation of the memory 
device. 

The memory devices in accordance with the present invention are able to span address spaces which are not 
contiguous by controlling the communication addresses of the memory modules. Furthermore, the address space 
spanned by the memory devices can be dynamically modified both in location and size. This is made possfole by the 

30 incorporation, in each memory module, of a programmable identification (ID) register which contains the base address 
of the memory module and a mechanism which decommissions the module from acting on certain memory access 
commands from the bus. The present invention therefore provides for a memory device with dynamically reconfigurable 
address space. Dynamically reconfigurable address space is especially useful in virtual memory systems in which a 
very large logical address space is provided to user programs and the logical address occupied by the programs are 

35 dynamically mapped to a much smaller physical memory space during program execution. 

Each memory array in the present design is equipped with its own row and column address decoders and a special 
address sequencer which automatically increments address of the column to be accessed. Each memory array has 
data amplifiers which amplify the signals read from the memory array before the signals are transmitted to the lines of 
the DASS bus. Both the address sequencer and data amplifiers increase the signal bandwidth of the memory array. 

40 Consequently, each memory array is capable of handling the I/O data bandwidth requirement by itself. This capability 
makes multiple bank operations such as broadcast-write and interleaved-access possible. For example, a memory device 
in accordance with the present invention is able to handle a broadcast-write bandwidth of over 36 gigabytes per second 
and 36 memory operations simultaneously. 

Memory devices in accordance with the present invention can be accessed both synchronously and asynchronously 

45 using the same set of connection pins. This is achieved using the following techniques: (i) using a self-timed control in 
connection with the previously described circuit-module architecture, (ii) connecting memory modules in parallel to an 
on-chip bus which uses source synchronous clocking, (iii) using half clock-cycle (single clock-transition) command pro- 
tocol, (iv) using an on-chip resynchronization technique. This results in memory devices that have short access latency 
(about 10 ns), and high data bandwidth (1 gigabyte/sec). 

so Another embodiment of the present invention provides for the termination of bus lines. In one embodiment, a passive 
clamp for a bus line is created by connecting a first resistor between the bus line and a first supply voltage and connecting 
a second resistor between the bus line and a second supply voltage. In one embodiment, the first supply voltage is V^, 
the second supply voltage is ground, and the first and second resistor have the same resistance. 

In an alternate embodiment, an active clamp for a bus line is created by connecting a p-channel transistor between 

55 the bus line and a first supply voltage and connecting an n-channel transistor between the bus line and a second supply 
voltage. The gates f the p-channel and n-channel transistors are driven in response to the bus line. 

The present invention will be more fully und rstood in view of the following drawings taken together with the detailed 
description. 
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BRIEF DESCRIPT ION OF THE DRAWINGS 

Fig. 1 is a block diagram of a memory device with a circuit-module architecture rganized around a DASS bus; 
Fig. 2a is a waveform diagram illustrating timing waveforms for asynchronous operations; 
5 Fig. 2b is a waveform diagram illustrating timing waveforms for synchronous operations; 
Fig. 3a is a schematic diagram of DASS bus transceivers; 

Fig. 3b is a schematic diagram illustrating details of one of the bus transceivers shown in Fig. 3a; 

Fig. 4 is a block diagram of a memory module in accordance with the present invention; 

Fig. 5a is a block diagram of a memory array containing redundant rows and columns; 
to Fig. 5b is a schematic diagram of a circuit facilitating in-system testing and repair using redundant rows and columns; 

Fig. 6 is a block diagram illustrating a data path in a column area of a conventional DRAM device; 

Fig. 7 is a block diagram illustrating routing of column address and data lines in a conventional 4 M-brt DRAM device; 

Fig. 8 is a block diagram illustrating column circuitry in accordance with one embodiment of the present invention; 

Fig. 9 is a schematic diagram of column circuitry in accordance with one embodiment of the present invention; 
is Fig. 10 is a block diagram of a conventional address sequencing scheme; 

Fig 1 1a is a block diagram of an address sequencing scheme in accordance with the present invention; 

Fig. 1 lb is a block diagram of one embodiment of the barrel shifter of Fig. 11a; 

Fig. 1 1 c is a schematic diagram of one of the flip-flops of the barrel shifter of Fig. 1 1b; 

Fig. 1 2 is a block diagram of a ^synchronization circuit in accordance with the present invention; 
20 Fig. 1 3 is a schematic diagram of one embodiment of the FIFO of Fig. 1 2; 

Fig. 14a is a schematic diagram of one embodiment of the latency counter of Fig. 12; 

Fig. 14b is a schematic diagram of a latch used in the latency counter of Fig. 14a; 

Fig 15 is a waveform diagram illustrating timing waveforms of the resynchronization circuit of Fig. 12 when th 
device operating synchronously; 
25 Fig 16 is a waveform diagram illustrating timing waveforms of the resynchronization circuit of Fig. 12 when the 
device 6 operating asynchronously; 

Fig. 1 7 is a block diagram of a memory device configured for broadcast-write operation; 
Fig. 1 8 ts a waveform diagram illustrating sequencing of an interleaved access operation; 
Fig. 19 is a block diagram of a memory system which includes a memory controller and multiple circuit-module 
30 memory devices connected in parallel through an I/O bus; 

Fig. 20a is a schematic diagram of a reduced CMOS swing bus transceiver with active termination; and 
Fig. 20b is a schematic diagram of a reduced CMOS swing bus transceiver with resistive termination. 
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DETAILED DE SCRIPTION OF T HE INVENTION 



Conventional bus systems make no distinction in signal amplitude (swing) with respect to the direction of signal 
transfer across the bus. The signal swing transmitted from one end of the bus is identical to that of a signal sent from 
the other direction. In a bus system where there are substantially more slaves than masters, bus capacitance is domi- 
nated by the bus drivers of communicating devices. This is especially true in a semiconductor (integrated circuit) envi- 

40 ronment where the bus and the communicating devices are on the same chip. 

Communication from masters to slaves is predominantly one-to-many (broadcast), and communication from slaves 
to masters is one-to-one (dedicated). Using a small bus swing when slaves communicate to the masters allows the bus 
driver of the slave device to be small. Reducing the stave bus driver size effectively reduces the bus capacitance, thereby 
facilitating low power, high speed operation. The cost of incorporating amplifiers in the bus receivers of the masters is 

45 relatively small because the number of masters is small. Using a large signal swing when masters communicate to the 
slaves avoids the high cost of amplifier circuits in the receivers of the slaves. Since the number of masters is smalt, using 
relatively targe bus drivers in the masters does not increase the bus capacitance substantially and thus has little effect 
on the bus operating frequency. 

so DASS bus structure and protocol 

Fig. 1 is a block diagram of a memory device 100 which utilizes a directional asymmetric signal swing (DASS) bus 
102 to couple master I/O module 104 and slave memory modules 1 1 1-128 in parallel. Although the present invention is 
described in connection with an embodiment having eighteen slave memory modules, it is understood that other numbers 
ss of modules can be used. Master I/O module 1 04 has one side connected to DASS (directional asymmetric swing system) 
bus 1 02 and another side connected to I/O bus 1 06. Slave memory modules 1 1 1 -1 28 contain arrays of dynamic random 
access m mory (DRAM). 

In ne embodiment, DASS bus 102 has 16 bi-directional lines ADQ[15:0] for multiplexed address, data and control 
information. 4 lines C[3:0] for control information, 2 lines Dm[1 :0] for writ -mask information, 1 line for source clock (Sck) 
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information and 1 line for destination clock (Dck) information. When referring to memory modules 111-1 28, th signals 
on lines C[3:0], Dm[1 :0], and Sck are inputs and the signal on line Dck is an output. No explicit memory module select 
signal is used. Memory module select information is implicit in the memory address used to access memory modules 
111-128. 

All memory transactions are initiated by either I/O module 1 04 or by devices connected to I/O bus 1 06 .In the former 
case. I/O module 104 contains a memory controller. In the later case, I/O module 104 acts as a repeater between I/O 
bus 106 and DASS bus 102. A memory transaction is initiated with a command. A typical command requires 20 bits of 
information carried on C[3:0] and ADQ[1 5:0]. Four bits are used to encode the operation to be performed, and depending 
on the contents of the four command bits, the remaining sixteen bits can be a combination of the following: base (memory 
module) address, bank address, row address, column address, command-code extension or control register data. Each 
command issued is referenced to a particular transition of the clock, in this case, a low-to-high transition. Data is grouped 
as half-words of 16 bits each. The DASS bus is capable of transferring one half-word at each clock transition (high-to- 
low or low-to-high), facilitating dual-edge transfer. Essentially, this allows a 32-bit word to be transferred in one clock 
cycle using a 1 6-bit data bus. 

The command protocol accommodates both synchronous and asynchronous bus operations and minimizes both 
the transfer overhead and the memory access latency. This is accomplished by sending the full operation code and 
address in half of a clock cycle (minimum time unit on the bus). This minimizes the overhead of command, transfer and 
allows the access latency to be very close to the inherent latency of the memory. If the command takes multiple half 
clock-cycles, the overhead also translates into access latency as most of the command information has to be received 
before one of memory modules 1 1 1 -1 28 can start the operation. For asynchronous operations, the clock signal functions 
as a command and data strobe. Figs. 2a and 2b illustrate the timing of asynchronous and synchronous read operations, 
respectively. In either case, the command signal is strobed and evaluated on every rising edge of the elk/strobe signal. 

During an asynchronous operation (Fig. 2a), the falling edge of the elk/strobe signal does not occur until the access 
latency of the memory has expired. When the falling edge of the elk/strobe signal occurs, the first half-word is read. After 
the latency associated with accessing the second half-word has expired, the elk/strobe signal transitions from low to 
high, thereby reading the second half-word. The latency for the second half-word is shorter than the latency for the first 
half-word because the address of the second half-word is generated internal to the chip. In the foregoing mann r, the 
memory device is operated in a dual-edge transfer mode. 

During synchronous operation (Fig. 2b), the first half-word signal is read during the second falling edge of the 
elk/strobe signal after the command signal is detected. The memory device is again operated in a dual -edge transfer 
mode, with the second half-word output occurring during the subsequent rising edge of the elk/strobe signal. Again, the 
latency for the second half-word is shorter than the latency for the first half-word. More details on the memory operations 
are discussed below. 

Limiting bus commands to one half clock cycle seems to limit the memory address range to 64K. However, by taking 
advantage of the inherent characteristics of DRAM access, and separating the access into two micro-operations, the 
whole address does not need to be presented at the same time. The memory access operation will be discussed in 
detail in the memory-operation section. 

DASS Bus friers end recurs 

40 

Fig. 3a is a schematic diagram illustrating bus transceiver 302 of slave memory module 1 1 1 and bus transceiver 
310 of master I/O module 104. Fig. 3b is a schematic diagram of bus transceiver 302 of memory module 111. Bus 
transceiver 302 includes a bus driver 304 and a bus receiver 306. Bus driver 304 is a conventional CMOS inverter with 
a PMOS transistor P1 0 for pull-up and an NMOS transistor N1 Ofor pull-down. Similarly, bus receiver 306 is a conventional 

45 CMOS inverter with a PMOS transistor P1 1 for pull-up and an NMOS transistor N1 1 for pull-down. 

Bus line 308 of DASS bus 1 02 connects bus transceiver 302 with bus transceiver 31 0 in I/O module 1 04. Transceiver 
310 includes bus receiver 312, bus driver 314, and clamping circuit 316. Clamping drcuit 316 limits the signal swing on 
bus line 308. Bus receiver 312 includes CMOS inverter 318 and bus driver 314 includes CMOS inverter 314. Clamping 
circuit 316 includes n-channel field effect transistors N1-N4, p-channel field effect transistors P1-P4 and inverter 321 . 

so Inverter 318 together with clamping circuit 316 form a single stage feedback amplifier which amplifies the signal on 
bus line 308. The output of inverter 31 8 has a swing of approximately 0.5 to - 0.5 volt and is used to drive other on- 
chip CMOS logic. 

The operation of DASS bus 102 is dependent upon the bus transceivers 302 and 310. Bus transceivers 302 and 
310 dictate operating speed, power dissipation and, to a large extent, the total die area. In accordance with one embod- 
55 iment of the present invention. I/O module 104 drives DASS bus 102 with a full (supply voltage) swing. Memory 
modules 1 1 1 -128 drive DASS bus 102 with a reduced CMOS swing of approximately 1 Volt centered around V^. 

Bus receiver 312 operates in the following manner. Wh n I/O module 104 is receiving and memory module 1 1 1 is 
driving, a logic low signal is provided t clamp circuit 316 on lead 320. As a result, transistors P4 and N4 are turned on 
and clamp circuit 316 is enabled. When the Read_data voltage at the input of inverter 304 is at ground, the output of 
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inverter 318 is at a voltage close to ground, transistor P3 is on. transistor N3 is off, transistor P2 is on, transistor N2 is 
off, transistor N1 is on. and transistor P1 is off. Transistors N1 and N4 provide a conducting path from bus line 308 to 
ground, thereby preventing the signal on bus line 308 from going to and clamping the voltage on bus line 308 at a 
voltage of approximately Vdd/2 + 0.5 Volt. 

5 When the Read^data voltage at the input of inverter 304 switches fr m ground t V^, transistor P 1 0 (Fig. 3b) turns 
off and transistor N10 turns on. thereby pulling bus line 308 towards ground. Transistor N1, still being on. acc lerates 
the pull down on bus line 308 until the logic threshold of inverter 318 is reached. At this time, the output of inverter 318 
switches to high, turning transistors N2 and N3 on. In turn, transistor N2 turns off transistor N1 and transistor N3 turns 
on transistor P1 . Transistors P1 and P4 provide a conducting path between bus line 308 and V^, thereby clamping the 

w signal on bus line 308 at approximately - 0.5 volt 

As the voltage on bus line 308 swings from one logic level to another, clamping does not switch direction until the 
output of amplifier 318 finishes the logic transition. Clamping circuit 316, before it switches, accelerates the switching of 
inverter 318. The voltage swing on bus line 308 can be adjusted by changing the size of clamping transistors N1.P1, 
N4 and P4 or the driver transistors N10 and P10. 

is When I/O module 1 04 is driving and the memory module 1 1 1 is receiving, a logic high signal is applied to lead 320. 
Consequently, transistors P4 and N4 are opened and clamp circuit 316 is disabled. Transistors P4 and N4 have channel 
widths (sizes) two times larger than the channel widths of transistors P1 and N1, respectively. When the signal on line 
320 is de-asserted, DC current in clamp circuit 316 and invertef 318 is eliminated. As a result, signals transmitted from 
bus driver 31 4 to bus receiver 306 on bus line 308 have a full swing. 
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Memory module organization 



The organization of memory module 1 1 1 in accordance with one embodiment of the present invention is illustrated 
in Fig. 4. In this embodiment, memory modules 112-128 are identical to memory module 111. Memory module 111 

25 contains two memory arrays 402a and 402b. each having 256K bits organized as 256 rows and 1 024 columns. Memory 
array 402a includes word line driver and decoder 404a, column decoder 406a, sense amplifier circuitry 408a, and column 
select and data amplifier circuitry 410a. Similarly, memory array 402b includes word line driver and decoder 404b, column 
decoder 406b, sense amplifier circuitry 408b, and column select and data amplifier circuitry 410b. 

Memory arrays 402a and 402b share a common DASS memory bus interface 41 2 which connects memory module 

30 1 1 1 to DASS bus 102. Bus interface 412 contains command decoding logic, timing control circuitry, address advancing 
circuitry, and bus drivers and receivers. Bus interface 412 also contains two programmable registers, an identification 
(ID) register 414 which stores the communication address of memory module 111, and an access-control register 416. 
ID register 414 includes a module disable bit 420 which can be programmed by a command from DASS bus 102. As 
described later, module disable bit 420 is dedicated for addressing redundant modules inside the memory device. 

35 

A<#re$s Mgppipg 

Each memory module 11 1-128 incorporates a programmable ID register (e.g., ID register 414) which contains the 
communication address of the respective module. A pre-programmed communication address is assigned to each of 

40 memory modules 1 1 1 -1 28. The communication address of each memory module 1 1 1 -1 28 can be changed during sys- 
tem operation by a command from DASS bus 102. Specifically, an ID write command is transmitted on DASS bus 102 
to write the new communication address to the desired ID register. 

The complete address to any memory location in any of memory modules 1 1 1 -128 contains 4 fields. A first field 
contains a base address which identifies the memory module by communication address. A second field contains an 

45 address which identifies the memory array within the memory module. Third and fourth fields contain the addresses 
which identify the desired row and column, respectively. The outputs of memory modules 1 1 1 -1 28 are organized in 32- 
bit words. 

The programmable base address provides memory modules 1 1 1 -128 with dynamic address mapping capability by 
allowing the communication addresses of memory modules 1 1 1-128 to be modified during operation of the memory 
so device. 

In a system that contains 128 modules of 8K words, if the communication addresses of the memory modules are 
consecutively assigned, a 4M byte contiguous memory is formed in which seven address bits can be used to address 
the modules. In another application, a digital system may have distinct address spaces for a CPU (central processing 
unit) and for a display processor. The two processors can reside on the same bus using the same memory subsystem 
55 with some of the memory modules mapped to the CPU address space and the others mapped to the display processor 
address space. 
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Redundancy 

In accordance with one embodiment of the present invention, two levels of redundancy ar employed in a memory 
device using the circuit-module architecture described above. The first level of redundancy is memory rnodul redun- 
5 dancy Thus, in one embodiment, memory module 1 1 1 may be used as a redundant memory module. In other embod- 
iments, an additional memory module, identical to memory modules 1 1 1 -128, is coupled to DASS bus 102 and used as 
a redundant memory module. The redundant memory module is included to allow replacement of any defective regular 
module. 

In an embodiment which uses memory module 1 1 1 as a redundant module, module disable bit 420 (Fig. 4) of module 
io 1 11 is pre-programmed such that during normal operation of memory device 1 00, module 1 1 1 is disabled from partici- 
pating in any memory accesses. However, ID register 414 is still accessible through the bus interlace 412. The module 
disable bits of modules 1 1 2-1 28 are programmed such that these modules are enabled. 1 

If one of the memory modules 1 1 2-1 28 fails during operation of memory device 1 00, the defective module is decom- 
missioned by programming the disable bit of its ID register. The redundant module 1 1 1 is activated by reprogramming 
is module dsable bit 420 and writing the communication address of the defective module to ID register 414. 

The second level of redundancy is row and column redundancy. Redundant rows and columns are added to each 
of memory arrays 1 1 1 -1 28 tor replacement of defective rows and columns in memory arrays 1 11 -1 28. 

Fig. 5a is a block diagram of a memory module 500 having redundant memory sub-arrays 505, 506, 51 5 and 51 6. 
Memory module 500 includes bus interface 520, ID register 521 , access control register 503. repair row address registers 
20 550 and 560, repair column address registers 551 and 561 , and memory arrays 508 and 51 8. Memory array 508 includes 
redundant row sub-array 505, redundant column sub-array 506 and regular memory array 507. Memory array 518 
includes redundant row sub-array 515, redundant column sub-array 516 and regular memory array 517. 

Test circuitry is included in memory module 500 so that redundanrhrow sub-array 505, redundant column sub-array 
506, redundant row sub-array 515, and redundant column sub-array 516 can be tested. Prior art memory redundancy 
25 circuits test redundant memory sub-arrays (spare rows and columns) through the use of Tri-level" logic on certain input 
pins (See, M. Hamada et al, "Semiconductor Memory Apparatus with a Spare Memory Cell Array", U. S. Pat. No. 
5,1 1 3,371 , incorporated by reference). However, the present invention eliminates the requirement of such a tri-level logic 
arrangement. 

In accordance with one embodiment of the present invention, two bits T 01 and T 0 o, within access-control register 
30 503 are dedicated as test-mode bits which allow the redundant row sub-array 505 and redundant column sub-array 506 
to be tested. When either or both of test bits T 0i and Too are set, memory array 508 is placed in a test-mode and access 
to the regular memory array 507 is disabled. 

Table 1 sets forth the various test modes for memory array 508. 



Table 1 



Toi 


Too 


Result 


0 


0 


Normal operation of memory array 508 


0 


1 


Test redundant column array 506 


1 


0 


Test redundant row array 505 


1 


1 


Test both redundant column array 506 and redundant row array 505 



45 

In a similar manner, test bits Ti t and T 10 of access control register 503 are dedicated as test-mode bits for redundant 
row sub-array 515 and redundant column sub-array 516 of memory array 518. 

Fig. 5b is a schematic diagram illustrating circuitry used to generate enable signals for regular memory array 507, 
redundant row sub-array 505 and redundant column sub-array 506. This circuitry includes flip-flops 510 and 51 1 , write 
50 enable lead 530, NOR gate 531 , address comparators 560 and 561 , repair row address register 550, repair column 
address register 551 , repair enable bits 540 and 541 , AND gates 567 and 568, row address lead 565 and column addr ss 
lead 566. 

The Q outputs of D-type flip-flops 510 and 511 are used to enable (or disable) redundant sub-arrays 505 and 506, 
respectively, (Fig. 5a). The Q outputs of flip-flops 510 and 51 1 are also provided to NOR gate 531 to generate a signal 
ss which disables (or enables) regular memory array 507 (Fig. 5a). Thus, a high output on lead 532 enables redundant row 
sub-array 505 and creates a low output on lead 534, thereby disabling memory array 507. Similarly, a high output on 
lead 533 enables redundant column sub-array 505 and creates a low signal on lead 534, thereby disabling memory 
array 507. 
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Test bits T01 and Too can be programmed from DASS bus (through bus interface 520). To program both test bits T 0 i 
and Too. bus interface 520 provides a logic high signal to the D inputs of flip-flops 510 and 51 1 . In addition, bus interface 
520 asserts a write enable signal on lead 530 (Fig. 5b), thereby causing test bits T 0 i and Too to go high. This test-mode 
circuitry allows for in-system testing of the redundant row and column sub-arrays 505 and 506. 

The test-mode circuitry illustrated in Fig. 5b also facilitates the replacement of defective rows and columns with rows 
and columns of redundant row and column sub-arrays 505 and 506. The following example describes the replacement 
of a defective row. The replacement of a defective column is performed in substantially the same manner. 

To replace a defective row, the address of the defective row is written from bus interface 520 to repair row address 
register 550. The repair enable bit 540 of repair row address register 550 is set to a logic high state, thereby providing 
a high signal to one input of AND gate 567. The contents of repair row address register 550 are compared with the 
current row address received on row address lead 565 using address comparator 560. When the row address on lead 
565 matches the contents of repair row address register 550, the output of comparator 560 transitions to a high state, 
thereby causing AND gate 567 to provide a logic high signal to the Set and Reset inputs of flip-flop 510. As a result the 
Q output of flip-flop 510 transitions to a logic high state, thereby enabling redundant row sub-array 505 and disabling 
regular memory array 507. 

in one embodiment of the present invention, the redundant row sib-arrays 505 and 51 5 each have one redundant 
row, and the redundant column sub-arrays 506 and 516 each nave 64 redundant columns. However, only one repair 
column address register is provided for each memory array and the columns are repaired in groups of 64. The repair 
enable bits 540 and 541 and the repair address registers 550 and 551 are incorporated as part of the access-control 
register 503 and are programmable through a command from bus interface 520 (as previously discussed) or through a 
fuse. ~ - 

Memory Operations 

25 As in a conventional DRAM, an access to memory modules 1 1 1 -128 is divided into two steps: a row access (RAS) 
operation followed by a column access (CAS) operation. A RAS operation requires the base, array, and row addresses. 
The RAS operation causes data in the designated row of the designated array to be transferred to the sense-amplifier 
latches. A CAS operation requires the base, array and column addresses. The CAS operation causes th data stored 
in the sense-amplifier latch designated by the column address to be input or output to DASS bus 102. Once data is 

30 latched in the sense-amplifiers, subsequent accesses to the different locations of the same row can be carri ed out di rectly 
using separate CAS operations without having to perform another RAS operation. Access to the sense-amplifier latches 
is much faster than direct access to the memory cells because the sense-amplifiers have a much stronger signal drive. 

In conventional DRAM, the RAS operation is signaled by a RAS control signal which must remain activated through- 
out the RAS and CAS access. However, in the present invention, the RAS and CAS operations are signaled by a com- 

35 mand code on the control bus C[3:0]. The command code does not need to be maintained throughout the access 
operation. In fact, once a RAS operation is performed, data latched in the sense amplifiers stays there until a precharge 
operation is executed. 

The precharge operation causes data in the sense-amplifier latches to be transferred to the row of DRAM cells 
designated by the original RAS operation. The precharge operation also triggers equalization on the outputs of the 

40 sense-amplifiers and the bit lines so that the memory array is prepared for the next RAS operation. As previously 
described, only part of the memory address is needed for each memory operation. That is, the column address is not 
needed in a RAS operation and the row address is not needed in a CAS operation. This allows the memory address for 
each operation to be transmitted over a relatively narrow address bus (16-bit) in half of a clock cycle, thereby minimizing 
access latency and making it possible to access the memory both synchronously and asynchronously. 

45 To access a memory array in the precharged state, two operations, which take two bus clock cycles, are required. 
Since transferring data from a memory array to the sense-amplifiers usually takes more than 20ns (longer than one 
clock cycle), the command protocol of the present invention does not increase the memory access latency (RAS access 
time). The command protocol of the present invention can be extended to any memory device have a row access time 
substantially longer than column access time without increasing the access latency of the memory device. 

so The RAS-CAS-Precharge protocol of the present invention advantageously allows the memory devic to operate 
both synchronously and asynchronously. This aspect of the present invention is described in more detail below. 

Data transfer to and from the sense-amplifiers is carried out in bursts. After accessing data identified by the CAS 
address, data in subsequent CAS addresses is automatically accessed by an address sequencing circuit, without sub- 
mitting a new command or address. A wad of data can be read or written every clock cycle, and an entire row of data. 

55 e.g., 32 words, can be accessed in one burst of 32 clock cycles. Because each memory array has its own address 
sequencing circuitry and column accessing circuitry, which are described in more detail below, each memory array is 
capable of operating at the same frequency as the bus dock. In fact a memory array in accordanc with the present 
invention can handle data bursts up to 1 gigabyte/second. 
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Memory arrays in conventional DRAM schemes are incapable of providing data at this frequency. In prior art DRAM 
schemes, the data accessed from the DRAM is supplied by several memory arrays and each memory array is operating 
at a significantly lower data bandwidth than the data I/O bandwidth. (See, for example. PCT patent document 
PCT/US91/02590 [Farmwald et aQ; "A 100MHz 4Mb Cache DRAM with Fast Copy-back Scheme" [K. Dosaki, Y. Konishi, 
5 K. Hayano, K. Himukashi, A. Yamazaki, C.A. Hart, M Kumanoya, H. Hamano, and T. Yoshihara; ISSCC. 1992 pp 148- 
149]). 

Column Accessing Circuitry 

10 Fig. 6 shows the data path in the column area of a memory array in a conventional DRAM. Memory array 601 
includes 256 rows and 1024 columns of memory cells. Two complimentary bit lines connect each column in memory 
array 601 to a sense-amplifier (SA) latch in sense-amplifier circuit 602. The two outputs of each'SA latch are connected 
to a corresponding column select switch in column switch circuit 603. The column select switches in column switch circuit 
603 are controlled by signals on column select bus 605. When the column select switches corresponding to an SA latch 

75 are closed, the SA latch is coupled to a corresponding complementary pair of data lines. Memory cell array 601 typically 
uses two data line pairs, (1) DQ0, BQ5 and (2) DQ1, DQ0. (See, "A 50-uA Standby 1Mx1/256Kx4 CMOS DRAM with 
High-Speed Sense Amplifier", S. Fujii et al, IEEE JSSC, vol. sc-21, no. 5, Oct 1986, pp. 643-648; and "A60-ns 4-Mbit 
CMOS DRAM with Built-in Self-Test Function", T. Ohsawa et al, IEEE JSSC, vol. sc-22, no. 5, Oct. 1987, pp. 663-668). 
In column select circuit 603, 51 2 column switches are multiplexed on each data line pair. Each data line runs along 

20 the long side of memory array 601. Consequently, the data line capacitance is large (about 4 to 5 pf). During read 
operations, this data line capacitance is driven by the SA latches through the column switch circuit 603. The SA latches 
have a relatively weak drive capability. Consequently, signals on the data lines have long rise and fall times, thereby 
limiting the read data bandwidth. 

During write operations, the data line capacitance is less of a problem because the data lines are driven directly by 

25 a relatively targe write buffer located outside of memory array 601 . However, the write cycle-time is determined by the 
write delay of the SA latch and the delay mismatch between the column address decoding path and the write data path. 
The latter delay can be significant because the column address decoding path and the data path are routed in different 
ways. 

Fig. 7 is a block diagram illustrating the column address decoding path and the data path of a typical prior art DRAM 
30 device. The column address bus 701 is connected in parallel to the memory arrays 702a-702g. However, the data path 
is made up of data lines 703-706 from several arrays. Consequently, the difference in loading and logic between th two 
paths is substantial. 

Fig. 8 is a block diagram of a column data path in accordance with one embodiment of the present invention. Each 
column of memory array 801 is connected to an SA latch in sense-amplifier circuit 802 by a bit line pair, such as bit line 

35 pair 803. The outputs of sense-amplifier circuit 802 are provided to tree decoder circuit 804. Tree decoder circuit 804 
includes thirty two 32-to-1 tree decoders. Each 32-to-1 tree decoder receives the complementary inputs from thirty two 
SA latches. Each 32-to-1 tree decoder includes two levels of switches. A first level, which is controlled by an 8-bit signal 
Sa[7:0], is constructed with four 8-to-1 multiplexers. The second level, which is controlled by a 4-bit signal Sb[3:0], 
includes a 4-to-1 multiplexer. Each input to the 4-to-1 multiplexer is coupled to an output of one of the 8-to-1 bit multi- 

40 plexers. Each 32-to-1 tree decoder provides a pair of complementary outputs to data amplifier circuit 805. These com- 
plementary outputs correspond to the two outputs of the SA latch of the selected column. 

Data amplifier circuit 805 includes thirty two data amplifiers. Each data amplifier receives the complementary outputs 
from a corresponding 32-to-1 decoder. The thirty two data amplifiers are grouped into sixteen pairs. Each data amplifier 
pair provides a multiplexed signal to one of sixteen data lines. 

45 Fig. 9 is a schematic diagram of tree decoders 901 and 91 1 and data amplifier pair 900. Data amplifier pair 900 
includes data amplifiers 902 and 912, multiplexer 907, read data latch 914, write buffers 903 and 913, tri-state buffer 
905 and clock generation circuit 91 8. 

The complementary outputs of tree decoders 901 and 91 1 are provided to data amplifiers 902 and 91 2, respectively. 
Data amplifiers 902 and 912 are regenerative latches controlled by a single phase clock signal D SENSE . 

so A local self-timed clock circuit 918 generates the control signals used to control data amplifiers 902 and 912 and 
multiplexer 907. Thus, a precharge signal, PC, and a sensing signal, D SENSE , are generated in response to bus clock 
signal, Clk, column_access (CAS) signal and pre-charge signal, Write_Enable. The Clk signal is a buffered v rsion of 
the Sck signal. The PC and D SENSE signals are local signals which are not used to drive any circuitry outside data 
amplifier pair 900. Thus, timing skew in the control signals is minimized. 

55 

R^C* Operation 

To perform a read operation, the Wrrte_Enable signal is de-asserted high. As a result, transistors 950-953 of write 
buffers 903 and 91 3 are turned off and tri-state buffer 905 is placed in a low impedance state. The CAS signal is asserted 
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high. During a first half cycle of the Ok signal, the Clk signal is in a logic high state, thereby forcing the both the D SENSE 
and PC signals to a logic high state. Under these conditions, the complementary outputs of tree decoders 901 and 91 1 
are latched in data amplifiers 902 and 912, respectively. * 

For example, a logic low signal on lead 925 and a logic high signal on lead 926 cause transistors 971 and 972 to 
turn on and transistors 970 and 973 to turn off. The high D SENSE signal causes transistor 961 to turn on. As a result, 
node 991 is pulled down to ground through transistors 972 and 961 and node 992 is pulled up to through transistor 
971 . In a similar manner, a logic low signal on lead 926 and a logic high signal on lead 925 results in node 992 being 
pulled to ground through transistors 973 and 961 and node 991 being pulled to through transistor 970. 

-Date .amplifier 912 operates in the same manner as data amplifier 902 to latch the signals present on leads 927 
and 928. Thus, a logic high signal on lead 927 and a logic low signal on lead 928 results in node 993 being pulled up 
to through transistor 974 and node 994 being pulled down to ground through transistor^ 977 and 962. Similarly, a 
logic low signal on lead 927 and a logic high signal on lead 928 results in node 993 being pulled to ground through 
transistors 976 and 962 and node 994 being pulled to through transistor 975. 

Within multiplexer 907, the high D SENSE signal causes transmission gates 995 and 997 to close (i.e., be placed in 
a conducting state) and transmission gate 996 to open (i.e.. be placed in a non-conducting state). As a result, the voltage 
on node 992 is transmitted through transmission gate 995 and tri-state buffer 905 to the DQ data line 930. DQ data line 
930 connects tri-state buffer 905 directly to the bus transceivers in the memory bus interface (See. e.g., inverters 304 
and 306 in Figs. 3 and 4). This connection results in little loading other than the routing capacitance because there is 
no other signal multiplexed on this line. Loading of DQ data line 930 is thus substantially smaller than that present in 
prior art schemes. Consequently, the DQ data lines of the present invention are capable of operating at much higher 
frequency (up to 250 Mhz). 

In addition, the voltage on node 933 is transmitted through transmission gate 997 and is stored in read data latch 91 4. 
During the second half cycle of the Clk signal, the Clk signal transitions low, thereby forcing both the D SENSE and 
PC signals low. In response to the low PC signal, transistors 920-923 are turned on. As a result/leads 925-928 are 
coupled to (i.e., leads 925-928 are precharged). In addition, the low D SENSE signal opens transmissi n gates 995 
and 997 and closes transmission gate 996. As a result, the voltage stored in read data latch 914 is read out through 
transmission gate 996 and tri-state buffer 905 to DQ data line 930 during the second half cycle. In the foregoing manner, 
dual-edge transfer of data from array 801 (Fig. 8) to data tines 806 is facilitated. 

30 Write operation 

To perform a write operation, the Wrrte_Enable signal is asserted low, thereby placing tri-state buffer 905 in a high- 
impedance state and applying a logic low signal to an input of each of NOR gates 954-957 in write buffers 903 and 913. 
During a first half cycle of the Clk signal, the Clk signal is in a logic low state, thereby closing transmission gate 906 and 
35 opening transmission gate 91 6. The signal on the DQ data line 930 is therefore routed to an input of NOR gate 955. For 
example, a high signal on the DQ data line 930 causes NOR gate 955 to provide a logic low signal to transistor 951 , 
thereby turning off this transistor. The low output of NOR gate 954 is also provided to an input of NOR gate 954, causing 
NOR gate 95 4 to output a l ogic high signal which turns on transistor 950. 

The low Write_Enable signal also causes the D SENSE and PC signals to go high, thereby turning off p-channel 
40 transistors 920-923 and turning on n-channel transistors 961-962. As a result, p-channel transistor 971 and n-channel 
transistor 972 are turned on. Consequently, tree decoder 901 receives supply voltage on lead 926 and the ground 
supply voltage on lead 925, thereby writing a high data value to the selected column of memory array 801 (Fig. 8). 

If the input from DQ data line 930 is a logic low signal (as opposed to a logic high signal as previously discussed), 
tree decoder 901 receives ground supply voltage on lead 926 and supply voltage on lead 925 in a manner similar 
45 to that previously described above. 

During the second half cycle of the Clk signal, the Clk signal transitions to a high state, thereby causing transmission 
gate 906 to open and transmission gate 916 to close. The signal on the DQ data line 930 is then transmitted through 
write buffer 913, data amplifier 91 2 and tree decoder 91 1 in a manner similar to that previously described. In this manner, 
data is written from the DQ data line 930 to the memory array during each half cycle of the Clk signal. The demultiplexing 
so performed by transmission gates 906 and 916 is necessary because the address selected by tree decoders 901 and 
91 1 changes only once every dock cycle. 

Tree decoders 901 and 91 1 limit the multiplexing loading to approximately 12 lines (8+4) (as opposed to 512 lines 
in a conventional scheme as previously described). The decreased capacitive loading together with the higher drive 
signal provided by data amplifier circuit 805 increase the data bandwidth. 

55 IIJ """"" 

Delay Manning 

High sp ed write operations are also facilitated by matching the address, data and dock paths. At th chip level, 
the address and data paths of memory device 100 are matched automatically because they share the same set of bus 
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lines (multiplexed address and data) on the DASS bus (s e Fig. 1 ). Delay matching between the clock and address/data 
bus lines is relatively easy because the clock is part of the bus and the clock loading is light. As described in more detail 
later, the clock loading is light because memory modules 1 11 -128 are self-timed and do not rely on a global clock for 
synchronization. 

Inside memory modules 111-128, delay matching is achieved as follows. Gate delay matching is carried out by 
inserting extra buffers in the paths with shorter delay Delay mismatch caused by gate loading and routing capacitance 
mismatches are minimized by using dummy loads. 

The dominant source of delay mismatch comes from the column decoders 406a and 406b (Fig. 4) . Col umn decoding 
includes a predecoding stage and a final decoding stage. In the predecoding stage, five column address lines are split 
into two groups with three column address lines connected to a 3-to-8 decoder and two column address lines connected 
to a 2-to-4 decoder. The 3-to-8 and 2-to-4 decoders are conventional decoders, each consisting of two levels of simple 
logic gates. The final decoding is performed by a 32-to-1 tree decoder (e.g., tree decoder 804 in Fig. 8) in the column 
area. The above described column decoding scheme simplifies delay matching between the different paths because 
the address path goes through a relatively few number of simple logic gates when passing through the 3-to-8 and 2-to- 
4 decoders. 

Delay mismatches are further minimized by arranging the clock, the pre-decoded column select signals Sa[7:0] and 
Sb[3:0] (see Fig. 9), and the DQ lines routed in the same manner through the column area of the memory array. 

Address Sequencing Circuitry 

20 

Burst transfer of data requires a mechanism that automatically accesses data in consecutive address locations 
given only the starting address of the data burst. Using the starting address, the memory device generates subsequ nt 
addresses which are decoded to select the appropriate column lines. An address sequencer is needed to properly enable 
the appropriate columns during a burst transfer. 

25 Fig. 10 is a block diagram of a conventional address sequencer 1000 which includes an n-bit binary counter 1001, 
an n-to-2n decoder 1002 and a buffer 1003. (See, Motorola Memory Data Book. Device MCM62486A, pp. 7-100 - 7- 
109, 1992). The starting address is loaded from address bus 101 1 to counter 1001 by activating the load signal input 
to counter 1001. Address advancing is timed by a clock signal input to counter 1001. The output of counter 1001 is 
decoded by decoder 1 002 and then buffered by buffer 1 003. The signals provided at the output of buffer 1 003 ar column 

30 select signals that are activated one at a time to gate data words from the sense-amplifier latches. At every rising clock 
edge, counter 1001 is incremented and its output is decoded to generate the next column select signal to activate the 
next column select line. The column select lines are thus asserted in consecutive order, with each column select line 
being asserted for the duration of one clock cycle. 

One drawback to address sequencer 1000 is that the total delay from the rising clock edge to the activation of the 

35 column select signals is the sum of the clock-to-out delay of counter 1 001, the propagation delay of decoder 1 002 and 
the delay through buffer 1003. This total delay limits the burst frequency and therefore the access bandwidth. Another 
problem arises because the delay paths through decoder 1002 are not uniform for each output transition. Non-uniform 
decoder delay paths may cause simultaneous assertion of one or more column select signals for the duration of the 
decoder delay mismatches. As a result, read or write failures may occur, especially during high-speed operation. 

40 Fig. 1 1 a is a block diagram of an address sequencer 1 1 00 in accordance with the present invention. For simplicity, 
3-bit decoding is shown. It is understood that the same principles can be applied to decode other numbers of bits in 
accordance with the present invention. Address sequencer 1100 includes a 3-to-8 decoder 1101, an 8-stage barrel 
shifter 1 102 and buffers 1 103. The 3-bit starting address is input to decoder 1 101 on bus 1 105. The 8-bit output of 
decoder 1 101 is loaded into barrel shifter 1 1 02 when the load signal input to barrel shifter 11 02 is activated. 

45 Rg. 1 1 b is a block diagram of one embodiment of barrel shifter 1 1 02. Barrel shifter 1 1 02 includes eight master/slave 
D-type flip-flops 1 120-1 127 connected in a ring configuration. The outputs of 3-to-8 decoder 1 101 are provided to the 
PD inputs of flip-flops 1 120-1 127. Only one bit of the output of 3-to-8 decoder 1 101 is high at any given time. A load 
signal is provided to each of the L inputs of flip-flops 1 120-1 127 and a clock signal is provided to each of the C inputs 
of flip-flops 1 120-1 127. The Q outputs of flip-flops 1 120-1 127 are provided to column select buffers 1 103. Barrel shifter 

so 1 102 is capable of shifting right and left for address increment and decrement, respectively. However, for clarity, only 
the right-shift configuration is shown. 

Fig. 11c is a schematic diagram of master/slave D-type flip-flop 1 120. In the embodiment illustrated, master/slav 
D-type flip-flops 1 120-1 127 are identical. When the load signal is asserted high, transmission gate 1 162 is closed and 
the PD input is stored in the master latch formed by inverters 1 1 50 and 1 1 70. The load signal is only asserted high wh n 

55 the clock signal is low. When the clock signal is low, transmission gate 1160 is open and transmission gate 11 61 is 
closed. As a result, the output of the master latch is transferred to the slave latch formed by inverters 1 1 51 and 1171. 
The Q output then has the same state as the signal applied to the PD input. Inverters 1 1 50 and 1 1 51 have weak output 
drive so that they can be easily overcome by the drive of transmission gates 1 160 and 1 161 . 
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Because th decoded address is loaded simuttaneousiy to both the master and slave stage of D-type flip-flops 1 1 20- 
1 1 27, the baoeUhffter 1 1 02 does not constitute a pipeline stage in the address path. 

Once the output of 3-to-8 decoder 1 101 has been loaded into flip-flops 1 120-1 127, the load signal is deasserted 
low, effectively disconnecting the PD inputs of flip-flops 1 120-1 127 from 3-to-8 decoder, 1101. The high bit which was 
5 loaded into barrel shifter 1 102 is then circulated through flip-flops 1 120-1 127 in a cyclical manner, with the high bit 
shifting one flip-flop during each clock cycle. 

The 8-bit output of barrel shifter 1 102 is connected through buffer 1 103 to consecutive column select lines Sa[7:0] 
of tree decoders 901 and 91 1 (Fig 9). The column select lines Sa[7:0] are thus asserted in consecutive order, one at a 
time, for the duration of one clock cycle. 
io The total delay time of address sequencer 1 100 is less than the total delay time of conventional address sequencer 
1 000. This is because address sequencer 1 1 00 does not experience any delay associated with decoder 1 1 01 after the 
initial address is loaded into barrel shifter 11 02. As a result, address sequencing circuit 1 1 00 can operate at much higher 
frequencies than address sequencer 1000. 

Additionally, because flip-flops 1 1 20-1 127 are identically constructed, the outputs of barrel shifter 1 1 02 have uniform 
75 clock-to-out delays. Furthermore, there are no combinational logic gates between the output of barrel shifter 1 102 and 
column select lines Sa[7:0] . Consequently, the clock-to-column-select-asserted time is well matched for all column select 
lines, thereby avoiding simultaneous assertion of the column select lines and minimizing read or write failures caused 
by address transitions. 

An additional address sequencing circuit similar to address sequencer 1 100 is used to generate four column select 
20 signals Sb[3:0] in response to a two-bit input CA[4:3]. As previously discussed, column select signals Sb[3:0] control 
the second level of switches (i.e., the 4-to-l multiplexers) in tree decoders 901 and 911 (Fig. 9). 

ClQcWng Scheme 

25 In accordance with the present invention, a clock distribution scheme: (1 ) allows the memory device to operat both 
synchronously and asynchronously, (2) minimizes skew to allow high-speed device operations, and (3) reduces op rating 
power. 

Memory device 100 (Fig. 1 ) can operate both asynchronously and synchronously. To achieve synchronous operation, 
self -timed design techniques, as exemplified by self-timed clock circuit 918 (Fig. 9) described above, and the resynchro- 
30 nization circuit descrbed below, are used in memory modules 1 1 1 -1 28. 

To meet the requirements of high-speed synchronous operations, delay matching between the control, address and 
data paths is used. Acceptable delay matching is relatively easy to obtain in the present invention by using th previously 
described circuit-module architecture and keeping the area of each memory module relatively small. Because the internal 
operations of each memory module are independent of the internal operations of the other memory modules, timing 
35 skew is confined to a relatively small area inside each memory module. 

Timing skew is further minimized by the use of self-timed techniques which further localize the distribution of critical 
control signals. The self-timed techniques enable functional blocks not involved in particular operations to be disabled 
without a resynchronization penalty for activation. For example, when the Column_Aocess signal is low and the 
Wrrte_EnatXe signal is high, data amplifier pair 900 (Fig. 9) is turned off . As a result, the entire column circuitry is turned 

40 Off. 

The circuit-module architecture also allows any memory module which is not involved with a bus transaction to be 
automatically shutdown. When DASS bus 1 02 is not in a data transfer state, i.e. , no memory module is being accessed, 
each memory module decodes commands on the DASS bus during each rising edge of the Sck signal. When a memory 
read or write command is decoded, each memory module examines the communication ID of the command. All modules, 

45 except the module to which the command is addressed, go into an idle state until the read or write transaction is finished. 
Power dissipation in memory device 1 00 is therefore confined to small areas and involves only a small number transistors, 
thereby keeping the overall power consumption of memory device 1 00 relatively low. Consequently, memory device 1 00 
is suitable for low power applications. 

On DASS bus 102, source synchronous transfer is used to meet the synchronous and asynchronous operation 

so requirements. A source clock (Sck) signal and a destination clock (Dck) signal on DASS bus 102 facilitate the source 
synchronous timing. The Sck signal is used to synchronize data, addresses and commands from the mast r I/O module 
104 to memory modules 1 1 1 -128. The Dck signal is generated by one of the memory modules 1 1 1 -128 selected for 
access to provide synchronization timing for data transmitted from memory modules 111-128 to I/O module 104. 
The Dck signal is driven only by the memory module that is transmitting data. The Dck signal is gen rated within 

55 the active memory module by routing the Sck signal through a delay path which approximates the read data path of the 
memory module. Thus, while the Dck signal has the same frequency as the Sck signal, the Dck signal has no definite 
phase relationship with th Sck signal or th read data. During synchronous operation, the data output from each memory 
module must be synchronized with the Sck signal. A resynchronization circuit is therefore used to synchronize th data 
read from the memory modules to the Sck signal. 
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Resvnchronization circuit 

A resynchronization circuit is incorporated in master I/O module 104 (Fig. 1) to synchronize data read from memory 
modules 1 1 1 -128 with the Sck signal during synchronous operation. During asynchr nous operation, the resynchroni- 
5 zation circuit is disabled, That is, data read from memory modules 1 1 1 -1 28 flows through the resynchronization circuit 
with little delay. 

Fig. 12 is a block diagram of resynchronization circuit 1200 which includes a 4<Jeepfirst-in-firstout (FIFO) membry 
- 1202, a latency control circuit 1204, a phase-locked loop (PLL) circuit 1206 and a mode.select flip-flop 1207. FIF0 1202 
receives a data input (Datajn) signal from the selected memory modules and provides temporary storage for the data 
10 values in the Datajn signal. Write operations within FIF0 1 202 are controlled by the Dck signal, the ftead_Enable signal 
and a mode_select signal. 

Mode_select flip-flop 1207 is programmed by the Write_EnaHe signal and another signal received from DASS bus 
1 02. The Q output of mode-select flip-flop 1 207 is used as a mode_select signal. The mode_seiect signal enables FIFO 
1202 and PLL 1206 when synchronous operating mode is selected (i.e.. the mode_se!ect signal is high): The 
75 mode_select signal disables FIFO 1202 and PLL 1206 when asynchronous operating mode is selected (i.e., the 
mode_select signal is low). 

PLL circuit 1206 is a conventional circuit which generates an output clock (Out_Clk) signal in response to the Sck 
signal. The Out_Clk signal is provided to FIF0 1202 and latency control circuit 1204. The Out_CIk signal is selected to 
ensure that transitions in the Data.Out signal of FIF0 1202 are in phase with the Sck signal (taking into account delays 
20 within FIF0 1202). 

The programmable latency control circuit 1204 receives the Out_Clk signal, the Read_EnaHe signal, the 
Write_Enable signal and an input signal from DASS bus 102. In response, latency control circuit 1204 gen rates an 
Advance.Enable signal which is provided to FIFO 1202 to control the reading of data values out of FIFO 1202. As 
discussed in more detail below, latency control circuit allows the user to set the number of half clock-cycles between the 
25 time a read command is detected to the time data is output from FIFO 1202. 

Rg. 1 3 is a schematic diagram of one embodiment of FIF0 1 202. FIF0 1 202 contains four data latches 1 30 1 -1 304, 
an input sequencer 1310 and an output sequencer 1320. The Datajn signal is provided to data latches 1301-1304 
through inverter 1305 on lead 1306. Data latches 1301-1304 include transistors 1307a-1307d, inverters 1308a-1308h 
and transistors 1309a-1309d. The data values stored in latches 1301-1304 are subsequently transmitted through tri- 
30 state buffer 1311 to output lead 131 2 as the Data_Out signal. Tri-state buffer 1311 is enabled by the fcead_Enable signal. 
Transistors 1307a-1307d are controlled by input sequencer 1310. Input sequencer 1310 includes ftip-flop 1315, 
AND gates 1316a-1316e and inverter 1317. Input select bus 1318 couples the outputs of AND gates 1316a-1316d to 
transistors 1307a-1307d. The outputs of AND gates 13 16a-1 31 6d provide input select signals ln_Se!0-ln_Sel3, respec- 
tively. 

35 Transistors 1 309a- 1 309d are controlled by output sequencer 1 320. Output sequencer 1 320 includes flip-flop 1 322. 
AND gates 1 324a-1 324e and inverter 1 326. Output select bus 1 328 couples the outputs of AND gates 1 324a- 1 324d to 
transistors 1309a-1309d. The outputs of AND gates 1324a-1324d provide output select signals Out_SelO-Out__Sel3, 
respectively. 

For synchronous operation, the mode select signal is set high. When the Read.enable signal is de-asserted high 
40 and the Dck signal is low, input sequencer 1310 is reset so that latch 1301 is selected for input When bead.enable is 
assertecMow (i.e., after a read command is detected), input sequencer 1 310 sequentially generates input select signals 
ln_SelO-ln_Se!3 on input select bus 1318. Input select signals ln_SelO-ln_Se!3 sequentially enable transistors 1307a- 
1 307d, respectively, one at a time at each transition of the Dck signal. This causes the data values in the Datajn signal 
to be stored in consecutive latches 1301-1304. 
45 Before the Advance_Enable signal is asserted high, output sequencer 1320 is reset so that latch 1301 is selected 
for output When the Advance_Enable signal is asserted high, output sequencer 1 320 sequentially asserts output select 
signals Otrt_SelO-Out_Sel3 on output select bus 1328. Output select signals Out_SeK)-Out_Sel3 sequentially enable 
transistors I309a-1309d, respectively, one at a time at each transition of the Out„Clk signal. 

Because FIF0 1 202 has four latches, data stored in latches 1 30 1 -1 304 of FIF0 1 202 is over-written every two clock- 
so cycles. Therefore, data cannot remain in FIFO 1202 longer than 2 clock cycles before it is output to lead 1312. Since 
the Dck signal and the Out_Clk signal have the same frequency, data stored in FIFO 1202 will be output correctly as 
long as the Out_Clk signal does not lag the Dck signal by more than two clock cycles. 

Because of the access latency associated with memory modules 1 1 1-128, the Out_Clk signal actually I ads the 
Dck signal. Latency control circuit 1204 prevents the output sequencer 1320 of FIFO 1202 from being enabled until the 
55 access latency has expired. 

Fig. 14a is a schematic diagram of one embodiment of latency control circuit 1204. Latency control circuit 1204 
includes eight latches 1420-1428 connected to form a delay queue. Fig. 14b is a schematic diagram of dynamic latch 
1420. Latch 1420 includes transmissi n gates 1440-1442 and inverters 1443-1445. Latches 1421-1428 are identical to 
latch 1420. 
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Latency register 1410 controls the number of clock edges (i.e. , half clock cycles) which elapse after a read command 
is detected before data is output from FIFO 1202. Latency register 1410 can be programmed with a 3-bit input through 
DASS bus 102 when the W riteJEnable sig nal is asserted. The contents of latency register 1410 are provided to a 3-to- 
8 decoder 1412. When the Read_Enable signal is high, each of dynamic latches 1420-1428 is isolated from its D input 
5 and the outputs of 3-to-8 decoder 1412 are loaded into latches 1421-1428. Latch 1420 is loaded with zero because its 
PO input is tied to ground. 

When the Read_Enable signal is asserted low, latches 1420-1428 are disconnected from 3-to-8 decoder 1412, 
thereby forming a delay queue. When the selected delay is an even number of half-clock cycles (i.e., Q0 = 0), the Q 
0 output of latch 1 421 is routed through transmission gate 1 43 1 to provide the Advance_Enable signal. When the selected 

to delay is equal to an odd number of half-clock cycles (i.e. , Q0 = 1 ), the Q output of latch 1 420 is rputed through transmission 
gate 1 430 to provide the Advance_Enable signal. Latch 1 420 is provided to assure that the desired odd half clock cycle 
delay is properly implemented. 

For example, if one half clock cycle of delay is desired, latches 1420 and 1422-1428 are loaded with "0 M s and latch 
1 421 is loaded with a "1 ". The value of Q0 is 1 , thereby closing transmission gate 1 430. When the Read_Enable signal 
15 goes high, a delay queue is formed. This delay queue is clocked by the output of NAND gate 1450. NAND gate 1450 
receives the Read_Enable signal, the Out_Clk signal and the Advance_EnabJe signal. The Advance_Enable signal is 
created by transmitting the output of transmission gate 1 430 through inverter 1 451 . 

Because the Read_Enable and Advance_EnaWe signals are high, the Out.Clk signal determines the output of 
NAND gate 1451. Because the Out_Clk signal is initially high, the output of NAND gate 1450 on lead 1454 is initially 
20 low. The output of NAND gate 1 450 is also transmitted through inverter 1 452 to lead 1 453. As a result, the transmission 
gate 1440 (Fig. 14b) of latch 1420 is initially open. 

During the next half clock cycle, the Out_Clk signal transitions to a low state, thereby resulting in high and low signals 
on leads 1454 and 1453, respectively. As a result, transmission gate 1440 (Fig. 14b) of latch 1420 closes and th data 
value stored in latch 1421 (i.e., "I") is transmitted through inverters 1443 and 1444 of latch 1420. This "1" value is 
25 transmitted through transmission gate 1430, resulting in a high Advance_EnaWe signal (and a low Advanc _EnaWe 
signal). The high Advance_Enable signal enables the output stage of FIFO 1202. 

The low Advance_Enabie signal forces the output of NAND gate 1450 to a logic high state, thereby stopping the 
clocking of the latches 1420-1428. As a result the Advance_Enable signal remains high until the read transaction is 
terminated (i.e., the Read_EnabJe signal is deasserted). 
30 The latency control circuit 1204 illustrated in Figs. 14a and 14b operates in a similar manner for even clock cycle 
delays. 

Resynchronization circuit 1 200 operates correctly rf the number of half clock cycles programmed into latency register 
1 41 0 is greater than the access latency and smaller than access latency plus 4 half-clock cycles (i.e., two clock cycles). 

Fig. 15 is a waveform diagram which illustrates the timing of various data and control signals during synchronous 
35 operation with latency register 1410 set to a four h alf -cycle delay. A read command is detected at the rising edge of the 
Sck signal at point 1501. After a slight delay, the Read_Enabli signal is asserted low. Once the access lat ncy has 
expired, data value DO of the Datajn signal is written into FIF0 1202. In this example, the access latency is less than 
one period of the Sck clock signal. 

Upon receiving the Dck signal, input sequencer 1310 (Fig. 1 3) sequentially generates input select signals ln_S 10- 
40 ln_Sel3. Input select signal ln_Sel0 is initially high, thereby turning on transistor 1307a and allowing data value DO to 
be written into latch 1 301 . Shortly after the Dck signal transitions to a low state, input select signal tn_Sel0 is de-asserted 
and input select signal ln_Se!1 is asserted, turning on transistor 1307b and allowing data value D1 to be written into 
latch 1 302. This process is continued, with input select signals ln_Set0-ln_SeJ3 sequentially enabling transistors 1 307a- 
1 307d to write data values into latches 1 301 -1 304. 
45 Because the Advance_Enable signal is initially low, output select signal Out_Sel0 is initially high. Consequently, 
transistor 1 309a is initially closed and data value DO is transmitted out of FIF0 1 202 to output lead 1312 once the access 
latency has expired. A short flow through latency associated with transmitting the data value DO through latch 1301 is 
not illustrated in Fig. 15. 

Because latency register 1410 has been programmed with a four half-cycl e delay, the Advance_Enabie signal tran- 
50 srtions to a high state during the fourth transition of the Out_Clk signal after the Read_EnaWe signal is asserted. Shortly 
after the Advance.Enable signal transitions to a high state, output select signal Ou1_Sel0 transitions to a low state and 
output select signal Out_Sel1 transitions to a high state, thereby opening transistor 1309a and closing transistor 1309b. 
As a result, data value D1 is read out of latch 1302 to output lead 1312. The delay introduced by latency register 1410 
spans the resynchronisation latency as well as the access latency. The resynchronization latency is the diff rence 
55 between the Sck signal and the Dck signal. Given the waveform diagram of Fig. 15, the data value D1 could have been 
read out at point 1502 if the latency register 1410 had been programmed for a three half dock cycle delay. However, by 
programming lat ncy register 1410 with a four half clock cycled delay, the user is able t add a half cycle of lat ncy. 

This process is continued, with output select signals Out_SeK)-Out_Sel3 sequentially enabling transistors 1309a- 
1309dt read data values ut of latches 1301-1304. 
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Fig. 1 6 is a waveform diagram illustrating the timing of resynchronization circuit 1 200 during asynchronous operati n 
During an asynchronous operation, the mode select signal of flip-flop 1207 (Fig. 12) is set low, thereby disabling PLL 
circuit 1206. As a resu(t, the Out.Clk signal and Advance.Enable signals are also disabled. Consequently, the output 
of AND gate 1 324e (Fig. 1 3) is set low and flip-flop 1322 is disabled with its output S set high. Thus, both inputs of AND 
5 gate 1324a are high, causing the Out_SelO signal to transition to a high state and turning on transistor 1309a of latch 
1301. 

On the input side, the low mode select signal is transmitted through inverter 1350 to NOR gate 1351. As a result, 
flip-flop 1 31 5 is disabled and its output Q is set to a high state. The low mode_select signal is also provided to AND gate 
1316e. thereby causing a logic low signal at the output of AND gate 1316a As a result, both inputs to AND gate 1.316a 
10 are high. As a result, the ln_SeI0 signal transitions to a high state and transistor 1307a of latch 1301 is turned on. 
Consequently, data value DO of the Datajn signal is transmitted through latch 1301. A small flow through delay 1602 
is associated with the transmission of the data value through latch 1301. Both the Out_Sel0 and ln_Sel0 signals stay 
high as long as the mode_select signal from mode_select flip-flop 1207 is low. 

'5 Multiple-module and Multipl e-array operations 

The circuit-module architecture of the present invention is well suited for multiple array operations. Operations such 
as broadcast-write and interleaved burst allow data from different memory arrays in different modules to be accessed 
simultaneously, thereby increasing the performance of the memory device! 

20 Fig. 1 7 is a block diagram of memory device 1 700 which is used to perform a broadcast-write operation. Memory 
device 1700 includes memory modules 171 1-1728 which are connected in parallel to master I/O device 1704 through 
DASS bus 1 702. Each of memory modules 1 71 1 -1 728 has two memory arrays. Two memory array-select bits ar pro- 
^ vided in the access-control register of each memory module 1 71 1 -1 728. These two bits are set or reset by a "Broadcast- 
write Select" command received on DASS bus 1 702. Once an array-select bit is set. the associated array is selected for 

25 participating in the subsequent write operations. A selected array remains selected until its associated array-select bit 
is reset. One or both arrays in a module can be selected. Furthermore, one or more modules can be selected. A write 
operation writes a data stream to all selected arrays simultaneously. 

In the embodiment illustrated in Fig. 1 7, memory array 1 732 in module 1 71 1 and memory arrays 1 730 and 1 731 in 
module 1728 are selected by programming the memory array-select bits in these modules. In other embodiments, other 

30 memory arrays and/or memory modules may be selected. After the desired arrays have been selected, a stream of write 
data is broadcast from I/O device 1704 to DASS bus 1702 and this data is simultaneously written into memory arrays 
1730-1732. 

In graphics applications, when the memory device 1700 is used as a display buffer, a fixed pattern can be simulta- 
neously written into multiple display screen memory locations using a broadcast-write operation, thereby significantly 
35 increasing the graphics update bandwidth. 

Another multiple-array operation is an interleaved burst operation, in which a read or write command causes data 
to be read from or written to different arrays in a time multiplexed data burst. Instead of bursting data in to or out of a 
single array, multiple arrays participate in a time-multiplexed manner. Each participating array latches in (or send out) a 
piece of data (i.e., one or more words) during a specified time period (i.e., one or more clock cycles) in a consecutive 
40 manner. 

Fig. 18 is a waveform diagram illustrating the addressing of read (or write) operations during an interleaved burst 
operation. An interleave-enable bit in the access-control register of each memory module determines whether an inter- 
leaved burst operation will be performed. The interleave-enable bit of each memory module is programmed from a 
command transmitted on the DASS bus. In one embodiment, another three bits in each access-control register deter- 

45 mines the total number of arrays which will participate in the interleaved operation. In such an embodiment, up to eight 
memory arrays can participate in an interleaved operation. In other embodiments, other nurnbers of memory arrays can 
participate in the interleaved operations. 

The waveform of Fig. 1 8, which is referenced to the structure of memory device 1 700 (Fig. 1 7), illustrates one such 
interleaving sequence. In this interleaving sequence, the interleave-enable bits in modules 1727 and 1728 are set. In 

so addition, the access-control registers in modules 1 727 and 1 728 are programmed to indicate that four memory arrays 
will participate in the interleaved access. A read command is then addressed to column address Z module 1727, array 
1741 (D771). Data words are then sequentially read out of the following addresses: column address L module 172Z, 
array 1741 (D771); column address 7, module 172&, array 173Q (D780); column address 7, module 1728, array 1731 
(D781); column address 7. module 172L array 174Q(D770); and column address & module 1722, array 1741 (D871). 

55 The sequence continues until the interleaved-burst read command is terminated. Each participating array takes a turn, 
in a round robin fashion, to send a data word. 

RAS and precharge operati ns are performed simultaneously in each memory array selected for an inter! aved- 
burst operation. A RAS or precharge operation addressed to any of the selected mem r y arrays causes all of the selected 
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arrays to simultaneously perform the RAS or precharge operation. This eliminates the need to issue multiple commands 
to multiple arrays to perform multiple RAS or precharge operations. Consequently, command overhead is saved. 

In graphics applications, where rows of memory cells in adjacent arrays ar mapped to consecutive**hori2ontal lines 
in a display screen (See. e.g., U.S. Patent No. 4,980,765 issued to Kudo et al). an interleaved-burst operation allows 
pixels in consecutive lines to be accessed in one data burst. In another embodiment, an interleaved-burst operation is 
used to perform graphical operations such as line draw and polygon draw, which require fast access to consecutive 
pixels in both the horizontal and vertical directions. 

In addition to the single-command multiple-data operations described above, a memory device in accordance with 
the present invention provides multiple commands, one after another, to different arrays. For example, a RAS command 
to a first memory array can be followed by another RAS command to a second memory array without waiting for the 
RAS command in the first array to finish, which in turn can be followed by a precharge command to a third memory 
array, followed by a CAS read command to a fourth memory array. Therefore, multiple memory arrays can perform 
multiple operations simultaneously, thereby increasing the performance of the memory device. 

Reduced Swino I/O Bus Structure and Protocol 

In certain embodiments, the I/O bus 1 06 (Fig. 1 ) connects multiple memory devices (such as memory device 1 00) 
to form a memory system with a larger memory capacity and/or more functions. One or more master devices can be 
attached to the I/O Bus 1 06 to control the operations in the system. A master device can be a bus master in certain bus 
transactions and a slave in the other bus transactions. 

Fig. 1 9 is a block diagram of a memory system 1900 in accordance with one embodiment of the present invention. 
Memory system 1900 uses memory controller 1920 as a master device and multiple DASS memory devices 1901 -1908 
as slave devices. One port of memory controller 1920 is coupled to a CPU through CPU bus 1931. Another port of 
memory controller 1 920 is coupled to memory devices 1901 -1 908 through an I/O bus 1 930. In an alternate embodiment, 
memory controller 1920 resides in the I/O module of one of memory devices 1901-1908. 

I/O bus 1930. which employs high-speed Reduced CMOS Swing (RCS) for signaling, includes: 16 bi-directional 
lines ADQpSfl] tor muttplexed address and data signals, 4 lines C[3:0] for command signals, 2 lines Dm[1 :0] for write- 
mask signals. 1 line tor a synchronization clock signal Mck, and 1 line for a clock enable signal Cke. The Cke and Mck 
signals are specific to I/O bus 1930. However, the remaining signals on I/O bu6 1930 are extensions of the signals 
present on the DASS buses which exist within each of memory modules 1901-1908. Thus, the I/O modules in memory 
devices 1901-1908 (similar to I/O module 104 in memory device 100) are interface bridges between the DASS buses 
of memory devices 1901-1908 and I/O bus 1930. However, unlike the DASS buses, which use source synchronization 
for the timing of information transfer, I/O bus 1930 is fully synchronous with a single clock signal (Mck). The protocol 
used in I/O bus 1930 is a super-set of the protocol used in the DASS buses. However, the protocol used in the DASS 
buses does not include the protocol involving the Cke signal. The Cke signal is used for stopping and starting the clocks 
inside the memory devices 1901-1908. This allows devices of slower speed to be attached to I/O bus 1930 without 
lowering the system clock (Mck) frequency. 

Dedicated chip select (CS) lines to each of memory devices 1901-1908 are also included for system initialization. 
At power-up or after system reset, the communication addresses of the memory modules in memory devices 1 901 - 1 908 
are reset to their default values. As a resurt, memory modules in different memory devices 1901-1908 may have the 
same communication address. The CS lines are used to program the memory modules within memory devices 1901- 
1908 so that the memory modules have different communication addresses in the overall memory system 1900. 

Address mapping in a Multi-device Memory System 

All devices attached to I/O bus 1930 are assigned unique communication addresses. This can be accomplished 
either by hardwired logic or by incorporating programmability in the ID assigning mechanism in the devices. In certain 
embodiments, a memory device can assume one or more communication addresses. Each memory module within 
memory devices 1901-1908 assumes a communication address. For memory operations, the communication address 
is contained in the memory address as a field. Each memory module spans a contiguous memory address space. 
However, the address space spanned by each memory device does not need to be contiguous since the communication 
address of each module can be individually programmed. By maintaining the same sets of commands and protocols in 
I/O bus 1930 and the DASS buses in memory devices 1901-1908. the ID registers of all modules in memory devices 
1 901 -1 908 are programmable through I/O bus 1 930. Consequently, all modules in memory system 1 900 can be dynam- 
ically assigned communication addresses to span different areas in the memory address space. 

In one application the communication addresses of the modules are assigned such that memory system 1900 has 
a contiguous memory space. In another application, the dynamic address mapping capability of th present inv ntion 
is used in computer systems operating on virtual memory addresses. In conventional memory devices which map to a 
fixed address space, th virtual address has to be translated to a physical address before a memory access can be 
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carried out. This required translation increases system complexity and memory access latency. However, using the 
present memory system, the communication address of a memory module can be programmed to assume a virtual 
address. A memory access can then be carried out without performing an address translation. Managing such a memory 
system is straightforward because allocating and de-allocating memory pages is a matter of changing the communication 
5 addresses of one or more memory modules. Because the present memory system is capable of operating with virtual 
addresses and it can be referred to as a "Virtual Main Memory". 

Fault Tolerant System 

10 Memory system 1900 (Fig. 19) is highly tolerant to defects. Three levels of redundancy provide memory system 
1 900 with high fault tolerance. At the system level, each memory device 1 901 -1 930 incorporates a disable register which 
when set disables the device from participating in memory transactions on the I/O bus 1930. Redundant devices can 
be easily incorporated on I/O bus 1 930 for repairing defective devices on the bus. 

Within each memory device 1901-1908, redundant memory modules are incorporated in each device and each 

15 memory module includes an ID register which is programmable through commands on I/O bus 1930. This redundancy 
mechanism allows for the efficient repair of defective modules both locally in the memory device and globally in other 
devices attached to I/O bus 1930. That is, any of the redundant modules in any of the memory devices 1901-1908 can 
replace any defective memory module in any of the memory devices 1901-1908. As more memory devices are added 
to memory system 1 900, the ratio of redundant modules to regular modules is maintained, but the ability to repair duster 

20 defects increases. For example, in a memory system having four memory devices, with each memory device having 
one redundant module, a cluster defect involving four or less modules can be repaired without any degradation in per- 
formance. This is advantageous because cluster defects; are the predominant cause of integrated system failure. Redun- 
dant memory modules of traditional redundancy schemes can only be used to replace memory modules within the sam 
memory device (i.e. , on the same chip). 

25 Within each memory array, redundant rows and columns are used to repair defects inside the respective memory 
array as previously described in connection with Figs. 5a and 5b. 

I/O Bus Drivers. Receivers and Terminations 

30 Electrically, the signals on I/O bus 1930 have a swing of approximately 2 volts centered around the middle of the 
supply voltage. The actual signal swing can be adjusted to optimize the operating frequency and minimize power dissi- 
pation. Two types of termination are used on I/O bus 1930 to suppress transmission line effects such as reflections and 
ringing. Details of the structure of the bus transceiver and termination are described below. 

In order to operate I/O bus 1930 at high clock frequencies, small-swing signaling is employed. To maximize the noise 

35 immunity and data rate, and minimize the complexity of the bus transceiver circuit, a logic threshold equal to half of the 
supply voltage (VdJ is used. This threshold voltage matches the threshold voltage of the rest of the on-chip CMOS logic. 
Consequently logic translation circuitry is eliminated. An active clamp or a passive clamp is used to limit the signal swing. 

Figs. 20a and 20b are schematic diagrams of active clamp 2002 and passive clamp 2011, respectively. Clarrps 
2002 and 201 1 limit the swing on a bus line 2030 of I/O bus 1930. P-channel transistor 2004 and n-channel transistor 

40 2005 form push-pull driver 2001 with equal sourcing and sinking capability. This balanced drive capability makes the 
signal transition of bus line 2030 symmetrical, thereby eliminating signal skew and maximizing the operating bandwidth 
of bus line 2030. The balance in pull-up and pull-down also yields a circuit with maximum supply noise rejection because 
transistors 2004 and 2005 spend equal amounts of time in the saturation region during signal transition. In fact, when 
properly selected, transistors 2004 and 2005 remain in the saturation region at all times, giving bus line 2030 maximum 

45 immunity to supply (V^) and ground (QND) noise. 

The gates of transistors 2004 and 2005 are driven by the outputs of NAND gate 2031 and NOR gate 2032, respec- 
tively. Logic gates 2031 and 2032 receive a Datajn signal and a Read.Enable signal as illustrated. The Read_Enab!e 
signal, when de-asserted high, turns off transistors 2004 and 2005, thereby tri-stating the bus driver. 

Receiver 2003 is a CMOS inverter which includes transistors 2008 and 2009. Receiver 2003 has equal pull-up and 

so pull-down capability. The input of receiver 2003 is coupled to bus line 2030 and the output of receiver 2003 provides a 
Data_Out signal 

Active clamp circuit 2002 (Fig. 20a) includes a CMOS inverter 2020 and clamp transistors 2006 and 2007 con- 
nected as source followers. The sizes of transistors 2006 and 2007 control the voltage swing on bus line 2030. In one 
embodiment, the sizes of transistors 2006 and 2007 are twice the sizes of transistors 2005 and 2004, respectively. Wh n 
55 bus line 2030 is driven from high to low by bus driver 2001 , and the voltage on bus line 2030 has not reached V dd /2 
volts, the output of inverter 2020 is low, transistor 2007 is on and transistor 2006 is off. When v ttage on bus line 2030 
is pulled below V&JZ volt, output of inverter 2020 goes high, turning transistor 2007 off and turning transistor 2006 on, 
thereby taking away th sinking current available to bus line 2030. As the voltage on bus line 2030 continues to go down, 
transistor 2006 is turned on stronger, thereby taking more sinking current from bus lin 2030. When the voltage on bus 
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line 2030 is approximately 1 .5 VTP above ground, the curr nt through transistor 2006 equals the current through tran- 
sistor 2005. and the voltage on bus line 2030 becomes steady. VTP is the turn on threshold voltage of transistor 2007 
(typically 1 volt). 

Similarly, a low to high transition of bus line 2030 caus s transistor 2006 to turn off and transistor 2007 to turn on, * 
with the voltage on bus line 2030 clamped at approximately 1.5 VTN below V^, where VTN is the turn on threshold 
voltage of transistor 2006 (typically 1 volt). 

Passive clamp 201 1 (Fig. 20b) is a resistor divider. Equal-value resistors 2016 and 2017 are connected between 
V^, bus line 2030 and ground. Passive clamp 201 1 can also be a Thevenin equivalent of a resistor divider. For example, 
a resistor having half the resistance of resistor 2016 can be connected to a supply voltage equal to half of V^. Passive 
clamp 201 1 takes the advantage of the finite output resistance of the driver transistors 2004 and 2005. When bus line 
2030 is driven from low to high, transistor 2005 is turned off and transistor 2004 is turned on. Initially, transistor 2004 
and resistor 2017 source more current than resistor 2016 can sink, thereby pulling the voltage on bus line 2030 high. 
As the voltage on bus line 2030 continues to rise, the sourcing capability of both transistor 2014 and resistor 2017 
decrease and the sinking capability of resistor 2016 increases. This continues until the total source current is equal to 
the sink current. The voltage on bus line 2030 then remains constant until the Datajn signal changes. Similarly, when 
bus line 2030 is driven from high to low, the voltage on bus line 2030 is clamped when the source current in resistor 
2017 equals the total sink current in transistor 2005 and resistor 2016. The voltage swing can be adjusted by varying 
the size of driver transistors 2005 and 2004 or the value of resistors 201 6 and 201 7. 

When using either active clamp 2002 or passive clamp 201 1 , only one damp circuit per bus line is required. The 
clamp circuit can be integrated in the master device, in one of the slave devices or in a separate device. Also, in both 
clamping circuits, bus line 2030 is clamped or terminated with a relatively small value resistance. In active clamp 2002, 
transistors 2006 and 2007 have relatively low output resistances since they are connected as source followers. In passive 
clamp 201 1, the Thevenin equivalent of the resistor divider facilitates the termination. The termination in both cases 
suppresses reflection and ringing which can degrade the signal-to-noise ratio and limit the operating f requ ncy on bus 
line 2030. 

This disclosure is illustrative and not limiting; further modifications and variations will be apparent to those skilled 
in the art in light of this disclosure and are intended to fall within the appended claims. 

Claims 

30 

1 . A data processing system comprising: 
a bus; 

a plurality of slave devices coupled in parallel to said bus, each of said slave devices having a slave bus 
transceiver for transmitting and receiving signals on said bus; 
35 a master device coupled in parallel to said bus, said master device having a master bus transceiver for trans- 

mitting and receiving signals on said bus, wherein signals transmitted from said slave bus transceivers to said master 
bus transceiver vary over a smaller voltage range than signals transmitted from said master bus transceiver to said 
slave bus transceivers. 

40 2. The data processing system of Claim 1 , wherein said bus, said slave devices and said master device are all fabricated 
on one chip. 

3. The data processing system of Claim 1 , wherein said data processing system is provided with a first supply voltage 
and a second supply voltage, and said master bus transceiver further comprises: 

a clamping circuit coupled to said bus, wherein said clamping circuit limits the signals on said bus within a 
voltage range which is less than the difference between the first and second supply voltages when said clamping 
circuit is enabled, and wherein the signals on said bus are limited to a voltage range which is approximately equal 
to the difference between the first and second supply voltages when said clamping circuit is disabled; 
a bus receiver circuit coupled to said bus; 
a bus driver circuit coupled to said bus; and 

means for enabling said clamping circuit when said bus receiver circuit is receiving signals from said bus and 
disabling said clamping circuit when said bus driver circuit is transmitting signals to said bus. 

4. The data processing system of Claim 1 , wherein said bus comprises a plurality of bus lines for carrying bi-directional 
55 multiplexed address, data and control information. 

5. The data proc ssing system of Claim 4, wherein at I ast one of said bus lines carries a clock signal for synchroni- 
zation of signal transfer on the bus. 
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6. The data processing system of Claim 5. wherein said address information includes device select information used 
to select said slave devices, whereby said bus does not require separate device-select lines connected directly to 
individual slave devices. 

7. The data processing system of Claim 5, wherein said plurality of bus lines transport said address, data and control 
information at both edges of said clock signal. 

8. The data processing system of Claim 6, wherein each of said slave devices has at least one modifiable identification 
register which contains a communication address which identifies each of said slave devices. 

9. The data processing system of Claim 8, wherein at least one of, said slave devices is a memory device having at 
least one memory array. 

1 0. The data processing system of Claim 9, wherein the address information comprises a base address of the memory 
device to be accessed, an array address of a memory array within the memory device to be accessed, and addresses 
of rows and columns within the memory array to be accessed. 

11. The data processing system of Claim 4, wherein one of said bus lines carries a destination clock signal for the 
synchronization of information transfer from a one of said slava devices to said master device and another of said 
bus lines carries a source clock signal for the synchronization of information transfer from said master device to said 
slave devices. 

12. The data processing system of Claim 1 1 , wherein said destination clock signal is driven by said one of said slave 
devices and said source dock signal is driven said master devjce. 

1 3. The data processing system of claim 12, wherein said destination clock signal is driven from the source clock signal 
through a path substantially matched to a corresponding data signal path in said slave device. 

14. The data processing system of Claim 4, wherein said master device is an I/O device and said data processing 
system further composes an I/O bus connected to said I/O device. 

15. The data processing system of Claim 14, further comprising a plurality of said data processing systems connected 
in parallel to said I/O bus. 

1 6. The system of Claim 1 4, wherein each of said slave devices has an identification registers which can be programmed 
through bus commands on said I/O bus. 

17. The system of Claim 15, further comprising: 

a system master device; and 

chip select lines connecting said system master device to each of said data processing systems, wherein 
said chip select lines are used to initialize base addresses of stave devices in said data processing systems. 

18. The system of Claim 17, wherein said base addresses are modified dynamically during operation of said data 
processing systems. 

19. The system of Claim 15, wherein said slave devices each comprise a disable register which is modifiable through 
said I/O bus. 

20. The system of Claim 1 7, wherein said system master device includes means to modify the base address of at least 
one of said slave devices in one of said plurality of data processing systems. 

21. The system of Claim 17, wherein said system master device includes means to test the memory locations in said 
slave devices, and to disable at least one of said slave devices which has one or more memory bits that fails the test. 

22. The system of Claims 21 wherein said system master device further comprises means to set the base addresses 
of said slave devices which pass the test such that these slave devices form a contiguous memory system. 

23. A memory device comprising: 

a plurality of m mory modules coupled in parallel to a bus: 
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an identification register located within each of said memory modules; 

programming means for writing a communication address to each identification register via said bus, wherein 
said memory modules are accessed by a command on said bus which includes the communication address of the 
memory module to be accessed. 

24. A memory device comprising: 

a memory array having a plurality of rows and columns of memory cells, wherein one of said rows is a redun- 
dant row and one of said columns is a redundant column; and 

an access control register coupled to said memory array, said access control register having a first programmable 
bit which, when enabled, provides access to said redundant row and a second programmable bit, which, when 
enabled, provides access to said redundant column. ' 

25. A redundant memory system comprising: 

a plurality of memory chips coupled in parallel to a bus, wherein each of said memory chips comprises a 
plurality of memory modules coupled in parallel to said bus, each of said memory modules comprises a plurality of 
memory arrays coupled in parallel to said bus. and each of said memory arrays comprises a plurality of rows and 
columns of memory cells; 

means for replacing a defective one of said memory chips with another one of said memory chips in response 
to signals on said bus; 

means for replacing a defective one of said memory modules with another one of said memory modules in 
response to signals on said bus; 

means for replacing a defective one of said memory arrays with another one of said memory arrays in 
response to signals on said bus; 

means for replacing a defective one of said rows of memory cells with another one of said rows of m mory 
cells in response to signals on said bus; and 

means for replacing a defective one of said columns of memory cells with another pne of said columns of 
memory cells in response to signals on said bus. 

26. A method of replacing a defective memory module with a redundant memory module, wherein both said defective 
memory module and said defective memory module are coupled in parallel to a bus, said method comprising the 
steps of: 

providing said defective memory module with an identification register which stores a communication address 
which identifies said defective memory module; 

transmitting a signal through said bus to said defective memory module to disable said defective memory 
module; 

writing the communication address of said defective memory module to an identification register within said 
redundant memory module through said bus; and 

transmitting a signal through said bus to said redundant memory module to enable said redundant memory 
module. 

27. A method of replacing a defective row of memory cells in a memory array with a redundant row of memory cells, 
said method comprising the steps of: 

disabling said defective row of memory cells; 

programming the row address of said defective row of memory cells in a repair row address register through 
a comand on a bus; 

comparing the row address stored in said repair row address register with a current row address; and 
enabling said redundant row of memory cells when the row address stored in said repair row address register 
matches the current row address. 

28. A method of replacing a defective column of memory cells in a memory array with a redundant column of memory 
cells, said method comprising the steps of: 

disabling said defective column of memory cells; 

programming the column address of said defective column of memory cells in a repair column address register 
through a command on a bus; 

comparing the column address stored in said repair column address register with a current column address; 

and 

enabling said redundant column of m mory cells when the column address stored in said repair column 
address register matches the current column address. 
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29. A memory array comprising; 

a plurality of memory cells arranged in rows and columns; 

a plurality of, sense amplifier latches, wherein each of said sense amplifier latches is coupled to a correspond- 
ing column of said memory cells; 
5 a plurality of decoder circuits coupled to said sense amplifier latches; 

a plurality of data amplifiers circuits coupled to said decoder circuits, wherein said data amplifiers amplify 
data signals read from said memory cells, thereby increasing the bandwidth of said memory array; and 

a plurality of terminals connecting said data amplifiers to bus lines which transfer data signals to and from 
said memory array. 

10 

30. An address sequencing circuit comprising: 
a decoder circuit for receiving an initial column address signal and decoding the initial column address signal 

to provide a decoded initial column address signal; 

a barrel shifter circuit for receiving the decoded initial column address signal, wherein said barrel shifter circuit 
is loaded with said decoded initial column address signal in response to a load signal and said barrel shifter circuit 
sequentially provides column select select signals in response to a clock signal; and 

a buffer circuit fa receiving said column select signals from said barrel shifter circuit and transmitting said 
column select signals to the column control circuitry of a memory array. 

A method for simultaneously writing data to a selected plurality of memory devices coupled in parallel to a bus, said 
method comprising the steps of: 

issuing a broadcast-write select command on said bus to set a broadcast-write select register in each of a 
selected plurality of memory devices; 

broadcasting a write command on said bus; and 

simultaneously writing data in parallel to each of said selected plurality of memory devices in response to 
said write command. 
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32. A method of reading data from a selected plurality of memory devices connected to a bus, said method comprising: 
setting an interleave enable bit in each of a selected plurality of memory devices; 
automatically advancing a read address in each of said selected plurality of memory devices; and 
alternately reading data from each of said selected plurality of memory devices, whereby said memory 
devices output data in a time-multiplexed, round-robin manner to form a single burst of data on said bus. 



33. A method of reading data from a selected plurality of memory devices coupled in parallel to a bus, said method 
35 comprising the steps of: 

issuing an interleaved-access select command on said bus to set an interleave enable bit in each of a selected 
plurality of memory devices; 

issuing a row access command on said bus to simultaneously perform row access operations in each of said 
selected plurality of memory devices; 
40 issuing a read command on said bus for alternately performing column access operations within each of said 

selected plurality of memory devices, whereby data is alternately read from said memory devices to said bus in a 
time multiplexed manner. 



34. A method of writing data to a selected plurality of memory devices connected to a bus, said method comprising: 
45 setting an interleave enable bit in each of a selected plurality of memory devices; 

automatically advancing a write address in each of said selected plurality of memory devices; and 
alternately writing data from said bus to each of said selected plurality of memory devices, whereby a single 
burst of data on said bus is alternately written to said memory devices in a time-multiplexed, round-robin manner. 



so 35. A method of writing data from a selected plurality of memory devices coupled in parallel to a bus, said method 
comprising the steps of: 

issuing an interleaved-access select command on said bus to set an interleave enable bit in each of a selected 
plurality of memory devices; 

issuing a row access command on said bus to simultaneously perform row access operations in each of said 
55 selected plurality of memory devices; 

issuing a write command for alternately writing data from said bus to columns in each of said selected plurality 
of memory devices in a time multiplexed manner. 
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36. A resynchronizati n circuit for processing a stream of data values read from a memory system, said resynchroni- 
zation circuit comprising: 

a first in, first out (FIFO) memory device which receives said stream of data values and a first clock signal 
from said memory system, wherein said data values are sequentially read into said FIFO memory device in response 
5 to said first clock signal; 

- *a phase locked loop circuit which receives a second clock signal and in response generates an output clock 
signal which leads said second clock signal, wherein said output clock signal is provided to said FIFO memory 
device to cause said data values to be sequentially read from said FIFO memory device, thereby generating a 
stream of data values which is synchronized with said second clock signal; and 
io a latency control circuit which enables said data values to be read from said FIFO memory device only after 

a selectable delay period which immediately follows the initiation of said read operation from said memory system. 

37. A method of resynchronizing a stream of data values comprising the steps of: 

detecting a read command signal which initiates a read operation of a stream of data values from a memory 
is system; 

generating a read enable signal in response to said read command signal; 

writing said stream of data values to a first in, first out (FIFO) memory device in response to said read enable 
signal, wherein said writing is performed in response to a first clock signal; 

generating an output enable signal with a selectable delay in response to said read enable signal; 
20 transmitting said output enable signal to said FIFO memory device; and 

reading said stream of data values out of said FIFO memory device in response to said output enable signal, 
wherein said reading is performed in response to a second clock signal, thereby synchronizing the stream of data 
values out of said FIFO memory device with said second clock signal. 

25 38. A passive termination circuit for controlling the termination voltage of a bus comprising: 
a first terminal for receiving a first supply voltage; 
a second terminal for receiving a second supply voltage; 

a bus driver circuit having a first transistor of a first conductivity type coupled between said first terminal and 
said bus and a second transistor of a second conductivity type opposite said first conductivity type coupled between 
30 said second terminal and said bus; 

a first clamping resistor coupled between said bus and said second terminal; and 
a second clamping resistor coupled between said bus and said first terminal, wherein the termination voltage 
of the bus is equal to one half of the difference between the first and second supply voltages. 

35 39. An active termination circuit for a bus comprising: 

a first terminal for receiving a first supply voltage; 

a second terminal for receiving a second supply voltage; 

a bus driver circuit having a first transistor of a first conductivity type coupled between said first terminal and 
said bus and a second transistor of a second conductivity type opposite said first conductivity type coupled between 
40 said second terminal and said bus; 

a third transistor of said first conductivity type coupled between said bus and said second terminal; 

a fourth transistor of said second conductivity type coupled between said bus and said first terminal; and 

an inverter having an input coupled to said bus and an output coupled to the gate of said third transistor and 
to the gate of said fourth transistor. 

45 

40. A memory device comprising: 

a system bus; 

a memory array coupled to said system bus, said memory array having a plurality of rows and columns of 
memory cells, wherein one of said rows is a redundant row; 
so a repair row address register for storing the row address of a defective row, wherein said repair row address 

register is programmable from a command on said system bus; 

an address comparator which compares the row address stored in said repair row address register with a 
current row address; and 

means responsive to an output signal of said address comparator, wherein said means enabl s access to 
55 said redundant row when the row address stored in said r epair row address register equals the current row address. 

41. A memory device comprising: 

a system bus; 

a memory array coupled to said system bus, said memory array having a plurality of rows and columns of 
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memory cells, wherein one of said columns is a redundant column; 

a repair column address register for storing the column address of a defective column, wherein said repair 
column address register is programmable from a command on said system bus; 

an address comparator which compares the column address stored in saidxepair column address register 
5 with a current column address; and 

means responsive to an output signal of said address comparator, wherein said means enables access to 
said redundant column when the column address stored in said repair column address register equals the current 
column address. 
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neously write a single input data stream to multiple 
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In on embodiment, multiple memory devices are 
coupled to a common, high-speed I/O bus without 



requiring large bus drivers and complex bus receivers in 
the memory modules. 
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